Product, Result of Idea Collision

When I am asked about the project complete time, I also feel a little bit curious why the stock alert project, simple project, can be so long. So I count the past recorded task time in my google Doc. The "study" + "design" is 70%+, the coding is only about 27%. While my initial estimate on the project development is 3 months, 90 days. Now the actual coding quite match it.  The delay is mainly from "Study".

AdminCodingDeploymentDesignStudyTestingGrand Total

Study! Yes, study, I study so many things in order to address the technical issues. The whole development process to me is more like a study process. I still can remember how I dozed away when watching Google cloud tutorial video. :) 

The following is the things for study. I know, the diagram is quite complex, which also surprise me. Initially I just want to come out a simple list. But its complexity pass the message, this is not a easy journey.

As my idol words, product is a collision of ideas. It starts from a unpolished idea, just like a open a window, then, you enter another new world, which is full of windows. This believing also imply how important to give the staff room for "mistakes". Because diamond is just a rock without polishing.

Wonderful article on JS Delete
Here’s a short summary of how deletion works in Javascript:
  • Variables and function declarations are properties of either Activation or Global objects.
  • Properties have attributes, one of which — DontDelete — is responsible for whether a property can be deleted.
  • Variable and function declarations in Global and Function code always create properties with DontDelete.
  • Function arguments are also properties of Activation object and are created with DontDelete.
  • Variable and function declarations in Eval code always create properties without DontDelete.
  • New properties are always created with empty attributes (and so without DontDelete).
  • Host objects are allowed to react to deletion however they want.

Analysis vs. Design: What’s the Difference?


  • Unambiguous 
    • Interpretable in only one way 
  • Testable 
    • Compliance (or, non-compliance) can be clearly demonstrated
  • Binding 
    • The customer is willing to pay for it and is unwilling to not have it 
Every requirement that is still necessary in spite of “perfect technology” is an essential requirement.

Requirements about speed, cost, and capacity go into the design bucket

Requirements about reliability (MTBF, MTTR) go into the design bucket

Requirements about I/O mechanisms and presentations go into the design bucket

Requirements about computer languages go into the design bucket
Requirements about archiving go into the design bucket

Requirements about the customer's business policy / business process go into the essential bucket

UML for Analysis

UML for Design


Reduce apparent complexity: one large problem becomes two smaller ones 
  • Understand the customer’s business policy / business process 
  • Figure out how to automate that business policy / process with the available technology 
Isolate areas of expertise
Apply the principles of coupling and cohesion at the highest level of the software architecture 
  • More robust, less fragile systems 
  • Enable separate evolution of the business policy / business process and the implementation technology

Responsive Web Design - Study Memo



Recommended Media Screen Width

  • 320px
  • 480px
  • 600px
  • 768px
  • 900px
  • 1200px
Use EM, relative unit, instead of PIXCEL for the font size. 

Use fluid grid for responsive website



今天看完了徐克的狄仁杰之通天帝国,实在是。。。差不忍睹!故事太牵强,情节发展太没逻辑。真不知道在看什么?还居然安排了一个潜规则的戏码,天后要我服侍大人,居然就是脱!!!晕倒。。。难道是演艺圈的潜规则太多,没有就不正常!虽然老狄同意出力,还是要弄个女人上保险!!!免得他不办事。 如果真办事,天后也需要弄个小三反腐!!!!


Outside world is wonderful!!!

Today just finish the study on the Require.js module concept and realize that my seeking is already implemented and my direction is right, to divide the whole website into modules (widgets) and load them when it is necessary. This idea emerged from 2 years ago for the mobile stock trading system. Although it is my first web/mobile project, my network system development experience make me more focus on the system data volume & network traffic. Because of the tight schedule, I didn't really fully implement it. Unit now, my own system, I decide to implement the dynamic loading feature, which the formal name should be Asynchronous Module Definition. After finish the JS/CSS/HTML loading based on the jQuery widget object, I happened to re-study the require.js. 

Then Shit!. That is what I need, asynchronous module loading, i18n... And all are nicely implemented, well organised.  After reading its history, I feel a little upset to find this nice tool late. 

Instead of struggling on why boss doesn't judge staff based on the performance, how could those guys be so shameless to take others credit or shifting the responsibility, these guys are who I am looking for. Suddenly recall my answer to my boss exist view question, 

Boss: "well, you can't work with peoples"
Me: "Well, how can I work with them?! like xxx incident, even black and white, they can lie!!!"

For sure, I can't work them. Because,

Outside world is so wonderful!  

Wednesday, November 20, 2013

Another HR Person

After the Monday lunch with the resigning HR manager from my ex-company, I wondering whether it is because of the HR person over professional that most company can't build up the environment like that taught in the management books.  Seems they lacks the basic sincerity. Maybe it is because their job, they have to deal with all kinds of peoples, you know, that is the area not always under the sun.

Most HR persons who I deal with before, not pleasant, one hand they will talk about the supportive, passion... all glory stuff; another hand, they they don't really believe it. 

Suddenly recall another HR person in my ex-company, is so "cool" and emotionless when she processes your request. But she will show you the warmest smile when talking about the company culture, working environment... 

So what happened this time? Shit, it is so fake!!!

I was touched when she insisted on farewell with me back when I was leaving this company 6 months ago. And really appreciate what she shared with me, 

  1. Listening to people is the key for teamwork & cooperation, which is lacked in this company. And it is far from easy like it sounds.
  2. Most HR persons works for their believing. The company culture must match with them, otherwise they will quit immediately. 
  3. She is strong mind and still believe she can improve the company culture. After all, she just join the company 1+ year
Woh! Wise and kind person! I like it. I even regret that I should know these persons earlier. It is such a big lost. But lucky, in the last minute, I still got chance to catch the wise.

So when recently I learn that she is leaving, I am so surprise!!!  my ex-company lose the talent again!? I can't help to invite her for lunch and hope she is all right. Then got all the following surprises. Are they same person?  

  1. Financial industry isn't her industry. She dislike this industry and can't settle on this industry. In October (after my leave 4 months), she finally make it out.
  2. This company main problem is decision process isn't clear, not professional, not like her previous companies, NTUC child care, NUS...
  3. This company is full of empathy although it may be a issue. And very strong teamwork! which may make your miss when you really get chance learn those big name company from inside, Apple, since they lack it
At beginning I was shock and can't get her points and slowly realize that I am so fool!. HR is just her job. Now I start to confirm my initial doubt on her sudden farewell invitation is true, she just worked for her boss to confirm whether my leave is because of that staff fire event and collect the comments on the company. 

But I am still inspired by "her wise" words, listening is the key for the teamwork and corporation. Like my favourite idiom, "No one can do anything to the person who keep lying.  It is just in the end, he/she can't believe others anymore". 

Such a pity job, HR is !

Wonderful Performance Metrics Tool

When my crawler project is closed to the end. my concern on the performance is heavier. How to measure my system performance? there is no easy answer for it. So many module, so many parameters, most important is all these CAN'T affect the system performance and increase the module complexity.  

Even for the JSON parsing function, I spend almost 1 working day to come out the performance measurement and it is just for the unit testing ;( . The result is just like the following. 

stockprice (36648000 records) elapsed ms:376806.531 for 36000 avg:9.937 variance:5.652 Fastest:9.000 Slowest:244.000
[67, 9 x 16910, 10 x 13029, 11 x 4306, 12 x 868, 73, 13 x 248, 14 x 123, 15 x 49, 17 x 15, 16 x 33, 19 x 9, 18 x 8, 21 x 26, 20 x 11, 23 x 51, 22 x 47, 25 x 43, 24 x 41, 27 x 39, 26 x 38, 29 x 24, 28 x 28, 31 x 3, 30 x 18, 34, 35 x 2, 32 x 2, 33 x 6, 38, 39 x 2, 36 x 6, 37 x 3, 42 x 2, 43, 41, 50, 48, 54, 244]

But how about other functions.... I was almost frightened by the future workload. It seems my system launch day need be postpone. Until today, I find this wonderful library, Metrics, through the netty example. It is fantastic and save me huge time on the performance measurement and reporting. 

With just few lines, the following result will be automatically printed into the System console. If you need, it can easily output the result into CSV, log file, JMX, even provide the servlet to remotely pass the result as JSON. Wonderful!!! 

With all these tools, I almost got the insurance on my system quality.

 final ConsoleReporter reporter = ConsoleReporter.forRegistry(Metrics.defaultRegistry())
reporter.start(1, TimeUnit.MINUTES);

Timer timer = Metrics.newTimer(this.getClass(),"StockPrice Batch Parse","timer",new SlidingWindowReservoir(nMax));


-- Timers ----------------------------------------------------------------------
test.JSONParserTest.StockPrice Batch Parse.timer
             count = 34356
         mean rate = 95.49 calls/second
     1-minute rate = 95.70 calls/second
     5-minute rate = 88.47 calls/second
    15-minute rate = 80.06 calls/second
               min = 9.32 milliseconds
               max = 244.26 milliseconds
              mean = 10.46 milliseconds
            stddev = 2.38 milliseconds
            median = 10.07 milliseconds
              75% <= 10.64 milliseconds
              95% <= 11.99 milliseconds
              98% <= 13.57 milliseconds
              99% <= 22.14 milliseconds
            99.9% <= 31.03 milliseconds 

When need a better messaging protocol for my project DTCrawler. I find these 2 new implementation. After study them, especially the book Networks for Grid Applications I choose UDT

1. built upon the UDP (which is my preference) 
2. Provide the flow/congestion management, which it is necessary for the application  

The difference is like this. 

"UDT borrows the messaging and partial reliability semantics from SCTP. However, SCTP are specially designed for VoIP and telephony, but UDT targets general purpose data transfer. UDT unifies both messaging and streaming semantics in one protocol."

Java Performance Tuning Study Memo

Wonderful blogs from!!! List of articles

Java type memory usage

byte, boolean1 byte
short, char2 bytes
int, float4 bytes
long, double8 bytes
Byte, Boolean16 bytes
Short, Character16 bytes
Integer, Float16 bytes
Long, Double24 bytes
EnumSetBitSet1 bit per value
EnumMap4 bytes (for value, nothing for key)
ArrayList4 bytes (but may be more if ArrayList capacity is seriously more than its size)
LinkedList24 bytes (fixed)
ArrayDeque4 to 8 bytes, 6 bytes on average
JDK collectionSizePossible Trove substitutionSize
HashMap32 * SIZE + 4 * CAPACITY bytesTHashMap8 * CAPACITY bytes
HashSet32 * SIZE + 4 * CAPACITY bytesTHashSet4 * CAPACITY bytes
LinkedHashMap40 * SIZE + 4 * CAPACITY bytesNone
LinkedHashSet32 * SIZE + 4 * CAPACITY bytesTLinkedHashSet8 * CAPACITY bytes
TreeMap, TreeSet40 * SIZE bytesNone
PriorityQueue4 * CAPACITY bytesNone
All Java objects start with 8 bytes containing service information like object class and its identity hash code (returned by System.identityHashCode method). Arrays have 4 more bytes (one int field) containing array length. It looks like all user-written (not JDK classes) have another reference to object Class. These fields are followed by all declared fields. All objects are aligned by 8 bytes boundary. All primitive fields must be aligned by their size (for example, chars should be aligned by 2 bytes boundary).Object reference (including any arrays) occupy 4 bytes. What does it mean for us? In order to get most use of available memory, all object fields must occupy N*8+4 bytes (4, 12, 20, 28 and so on). In this case 100% memory will contain useful data.

Java Boxing Type Caching

Byte, Short, LongCharacterIntegerFloat, Double
From -128 to 127From 0 to 127From -128 to java.lang.Integer.IntegerCache.high or 127, whichever is biggerNo caching

Java Performance Tips

Never use exceptions as return code replacement or for any likely to happen events (especially in not IO-bound methods!). Throwing an exception is too expensive - you may experience 100 times slowdown for simple methods.

Throwing an exception in Java is a very slow operation. Expect that throwing an exception costs you something between 100 and 1000 ticks in most cases.

Case Study

1. Enable Log

SET GLOBAL log_output = 'TABLE';
SET GLOBAL general_log = 'ON';

Memory Hierachy

Get this from

It is very strange. But the test result show it is.


innodb_version                  5.6.13
protocol_version                10
version                               5.6.13-enterprise-commercial-advanced
version_compile_machine x86_64
version_compile_os           osx10.7


SELECT udf_CreateCounterID(0,CURRENT_DATE);

SELECT @update,@updateend,@updatediff,@insertupdate,@insertupdate_end,@insertupdatediff,@keyval,@countlmt;

@update=2013-09-12 17:32:27
@updateend=2013-09-12 17:33:01

@insertupdate=2013-09-12 17:32:00
@insertdate_end=2013-09-12 17:32:27



CREATE TABLE `sys_CounterID` (
  `exch_year` int(11) NOT NULL,
  `nextID` int(11) NOT NULL,
  PRIMARY KEY (`exch_year`)

Test Function

CREATE DEFINER=`root`@`localhost` FUNCTION `udf_CreateCounterID`(exchID SMALLINT, listyear DATE) RETURNS int(10) unsigned
 counter ID is 32 bits, 
 highest 9 bits: exchange ID (until 2013,  totally 317 operator MIC. for any >511, modular 512)
 middel 7 bits: 2 digits year (max:99)
 left bits: counter number
 SET @countlmt = 1000000;
 SET keyvalue = ((exchID % 512) << 9 ) + EXTRACT(YEAR FROM listyear) % 100;

 SET @keyval = keyvalue;
 SET @retVal =  0;

 SET @count = @countlmt;
 SET @insertupdate = SYSDATE();

 WHILE @count > 0 DO

  INSERT INTO `sys_CounterID`(`exch_year`,nextID)
  VALUE( keyvalue, 1)
   nextID = (@retVal := nextID + 1);

  SET @count = @count - 1;


 SET @insertupdate_end = SYSDATE();
 SET @insertupdatediff = TIMESTAMPDIFF(SECOND, @insertupdate,@insertupdate_end);

 SET @count = @countlmt;
 SET @update = SYSDATE();
 WHILE @count > 0 DO

  UPDATE sys_CounterID 
  SET nextID = (@retVal := nextID + 1)
  WHERE exch_year = keyvalue;

  SET @count = @count - 1;


 SET @updateend = SYSDATE();
 SET @updatediff = TIMESTAMPDIFF(SECOND, @update,@updateend);

 RETURN @retVal;


High Speed Concurrent Framework, Disruptor

Have gone through many articles about the Disruptor, devised by LMAX team. It captures my eye because of this 

LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread. The Business Logic Processor runs entirely in-memory using event sourcing. The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

Impressive! isn't it :). But the above is a little bit misleading. It is not single thread system. To me, the key for this Disruptor achievements are they brilliantly avoid most common multi-thread traps. Today, instead of the old model, CPU, Registers & Memory, high performance program, need deal with the CPU, Register & Cache. The memory to the today CPU, is just like a hard disk in the old days. Another one is now CPU has multiple cores. The out of order executing will affect your program too.This great work again prove that how important the fundamental is, such as data structure, thread management in OS and deeply understanding on the hardware.

Besides LMAX articles, you can directly insights from Intel Intel® 64 and IA-32 Architectures Optimization Reference Manual. the L1 Cache has 2 types, data & instruction. For data is only 32 KB. The cache line size is 64bytes. The wiki provides very comprehensive explanation on the cache line. Another article from Microsoft on the Driver development clearly sates the common issues in the multiple processor architecture.

Sharing Is the Root of All Contention

from Herb Sutter Drbobbs blog

Cache Line & Atomic Operations


The 32-bit IA-32 processors support locked atomic operations on locations in system memory. These operations are typically used to manage shared data structures (such as semaphores, segment descriptors, system segments, or page tables) in which two or more processors may try simultaneously to modify the same field or flag. The processor uses three interdependent mechanisms for carrying out locked atomic operations:
  • Guaranteed atomic operations
  • Bus locking, using the LOCK# signal and the LOCK instruction prefix
  • Cache coherency protocols that insure that atomic operations can be carried out on cached data structures (cache lock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 family processors
These mechanisms are interdependent in the following ways. Certain basic memory transactions (such as reading or writing a byte in system memory) are always guaranteed to be handled atomically. That is, once started, the processor guarantees that the operation will be completed before another processor or bus agent is allowed access to the memory location. The processor also supports bus locking for performing selected memory operations (such as a read-modify-write operation in a shared area of memory) that typically need to be handled atomically, but are not automatically handled this way. Because frequently used memory locations are often cached in a processor’s L1 or L2 caches, atomic operations can often be carried out inside a processor’s caches without asserting the bus lock. Here the processor’s cache coherency protocols insure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations.

Memory Barrier Semantics

  • Acquire semantics mean that the results of the operation are visible before the results of any operation that appears after it in code. 
  • Release semantics mean that the results of the operation are visible after the results of any operation that appears before it in code.
  • Fence semantics combine acquire and release semantics. The results of an operation with fence semantics are visible before those of any operation that appears after it in code and after those of any operation that appears before it

Cachine Issue 

The hardware always reads an entire cache line, rather than individual data items. If you think of the cache as an array, a cache line is simply a row in that array: a consecutive block of memory that is read and cached in a single operation. The size of a cache line is generally from 16 to 128 bytes, depending on the hardware; 

Each cache line has one of the following states:
  • Exclusive, meaning that this data does not appear in any other processor’s cache. When a cache line enters the Exclusive state, the data is purged from any other processor’s cache.
  • Shared, meaning that another cache line has requested the same data.
  • Invalid, meaning that another processor has changed the data in the line.
  • Modified, meaning that the current processor has changed the data in this line.
All architectures on which Windows runs guarantee that every processor in a multiprocessor configuration will return the same value for any given memory location. This guarantee, which is called cache coherency between processors, ensures that whenever data in one processor’s cache changes, all other caches that contain the same data will be updated. On a single-processor system, whenever the required memory location is not in the cache, the hardware must reload it from memory. On a multiprocessor system, if the data is not in the current processor’s cache, the hardware can read it from main memory or request it from other processors’ caches. If the processor then writes a new value to that location, all other processors must update their caches to get the latest data.
Some data structures have a high locality of reference. This means that the structure often appears in a sequence of instructions that reference adjacent fields. If a structure has a high locality of reference and is protected by a lock, it should typically be in its own cache line.
For example, consider a large data structure that is protected by a lock and that contains both a pointer to a data item and a flag indicating the status of that data item. If the structure is laid out so that both fields are in the same cache line, any time the driver updates one variable, the other variable is already present in the cache and can be updated immediately.

In contrast, consider another scenario. What happens if two data structures in the same cache line are protected by two different locks and are accessed simultaneously from two different processors? Processor 0 updates the first structure, causing the cache line in Processor 0 to be marked Exclusive and the data in that line to be purged from other processors’ caches. Processor 1 must request the data from Processor 0 and wait until its own cache is updated before it can update the second structure. If Processor 0 again tries to write the first structure, it must request the data from Processor 1, wait until the cache is updated, and so on. However, if the structures are not on the same cache line, neither processor must wait for these cache updates. Therefore, two data structures that can be accessed simultaneously on two different processors (because they are not protected by the same lock) should be on different cache lines.

Mechanical Sympathy

First thing is to understand the CPU, your rice bow, Mechanical Sympathy.

The following is from

Don't use lock 

This will trap your program to "Amdahl's law", (of cause, bad side). The lock will cause the execution context switching, ring3 -> ring0 -> ring3... Refer to Trisha's blog for more detail. But how to avoid the lock in the multi-thread environment? The idea is "don't share the data". The shared resource is the lock existence reason. That means you need consider the data segregation for the high performance system. Their whole ring buffer design is around this point. 

Don't copy the data round for the inter-thread communication

Their ring buffer is just like a infinite array. And the index is just like the reference pointer. So the object isn't copied around and won't involve the dynamic memory management. That is today's most common stupid action for OO programmer, always new object(). Remember I have tried the similar idea before in my C++ project. 

Some valuable points from Trisha's blog

Martin and Mike's QCon presentation gives some indicative figures for the cost of cache misses:

Latency from CPU to...Approx. number of
CPU cycles
Approx. time
in nanoseconds
Main memory~60-80ns
QPI transit
(between sockets, not drawn)
L3 cache~40-45 cycles,~15ns
L2 cache~10 cycles,~3ns
L1 cache~3-4 cycles,~1ns
Register1 cycle

Cache Line

Volatile = Memory Barrier

This means if you write to a volatile field, you know that:
Any thread accessing that field after the point at which you wrote to it will get the updated value 
Anything you did before you wrote that field is guaranteed to have happened and any updated data values will also be visible, because the memory barrier flushed all earlier writes to the cache.

False Sharing 

from  Herb Sutter blog
The general case to watch out for is when you have two objects or fields that are frequently accessed (either read or written) by different threads, at least one of the threads is doing writes, and the objects are so close in memory that they're on the same cache line because they are:

  • objects nearby in the same array
  • fields nearby in the same object
  • objects allocated close together in time (C++, Java) or by the same thread (C#, Java)
  • static or global objects that the linker decided to lay out close together in memory;
  • objects that become close in memory dynamically, as when during compacting garbage collection two objects can become adjacent in memory because intervening objects became garbage and were collected; or
  • objects that for some other reason accidentally end up close together in memory.

First, we can reduce the number of writes to the cache line. For example, writer threads can write intermediate results to a scratch variable most of the time, then update the variable in the popular cache line only occasionally as needed. This is the approach we took in Example 2, where we changed the code to update a local variable frequently and write into the popular result array only once per worker to store its final count.

Second, we can separate the variables so that they aren't on the same cache line. Typically the easiest way to do this is to ensure an object has a cache line to itself that it doesn't share with any other data. To achieve that, you need to do two things:
  • Ensure that no other object can precede your data in the same cache line by aligning it o begin at the start of the cache line or adding sufficient padding bytes before the object.
  • Ensure that no other object can follow your data in the same cache line by adding sufficient padding bytes after the object to fill up the line.


Java Memory Model

Simple words always inspire peoples

Brian Goetz,

When evaluating a language feature, you need to examine both the cost and the benefit side of the proposal.

Benefit: how would having this feature enable me to write code that is better than what I can write today.

Cost: how would having this feature enable other people to write WORSE code than they might write today.

I like them! When consider things or make decisions, we should always remind ourselves on "Cost". The "Benefit" is just allure for our mistakes.

How HTTPS works, HTTP Tunneling & WebSocket


Finally understand how it works. HTTPS is just HTTP on top of SSL/TSL. HTTPs isn't a protocol at all. All the web proxy is just HTTP proxy. Their working flow is as 

Request message 
Client -> Proxy -> Server

Repsond message
Client <- Proxy <- Server

Because HTTP is just clear text message, the proxy is able to cache the data if the request is same. This is clearly defined in the HTTP protocol. 

The interesting part is about the HTTPS. I mistakenly believe it is similar as HTTP. But in fact it is completely not. HTTPS is HTTP message is packaged as SSL message. It can't be proxy/cached at all. It relies on the HTTP tunneling ( .

CLIENT -> SERVER                        SERVER -> CLIENT
--------------------------------------  -----------------------------------
User-agent: Mozilla/4.0
<<< empty line >>>
                                        HTTP/1.0 200 Connection established
                                        Proxy-agent: Netscape-Proxy/1.1
                                        <<< empty line >>>
              <<< data tunneling to both directions begins >>>

From the above, :) we can easily to tunnel any protocol over proxy, such as SSH.


The interesting part is the Web socket ( also rely on the HTTP, CONNECT, when need pass through the proxy.

URL format

 ws-URI = "ws:" "//" host [ ":" port ] path [ "?" query ]
 wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ]


client request:

        GET /chat HTTP/1.1
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
        Sec-WebSocket-Protocol: chat, superchat
        Sec-WebSocket-Version: 13

server response

        HTTP/1.1 101 Switching Protocols
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
        Sec-WebSocket-Protocol: chat

Message Frame

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |

CometD, Bayeux Server

CometD is the framework to implement the Bayeux protocol for the Comet messaging. Refer to the for the detail.

Its 2.4 performance can be found here 

CometD components

Message flow

  1. It invokes BayeuxServer extensions (methods rcv() or rcvMeta()); if one extension denies processing, a reply is sent to the client indicating that the message has been deleted, and no further processing is performed for the message.

  2. It invokes ServerSession extensions (methods rcv() or rcvMeta(), only if a ServerSession for that client exists); if one extension denies processing, a reply is sent to the client indicating that the message has been deleted, and no further processing is performed for the message.

  3. It invokes authorization checks for both the security policy and the authorizers; if the authorization is denied, a reply is sent to the client indicating the failure, and no further processing is performed for the message.

  4. If the message is a service or broadcast message, the message passes through BayeuxServer extensions (methods send() or sendMeta()).

  5. It invokes server channel listeners; the application adds server channel listeners on the server, and offers the last chance to modify the message before it is eventually sent to all subscribers (if it is a broadcast message). All subscribers see any modification a server channel listener makes to the message, just as if the publisher has sent the message already modified. After the server channel listeners processing, the message is frozen and no further modifications should be made to the message. Applications should not worry about this freezing step, because the API clarifies whether the message is modifiable or not: the API has as a parameter a modifiable message interface or an unmodifiable one to represent the message object. This step is the last processing step for an incoming non-broadcast message, and it therefore ends its journey on the server. A reply is sent to publishers to confirm that the message made it to the server (see below), but the message is not broadcast to other server sessions.

  6. If the message is a broadcast message, for each server session that subscribes to the channel, the message passes through ServerSession extensions (methods send() or sendMeta()), then the server session queue listeners are invoked and finally the message is added to the server session queue for delivery.

  7. If the message is a lazy message (see Section 7.4.7, “Lazy Channels and Messages”), it is sent on first occasion. Otherwise the message is delivered immediately. If the server session onto which the message is queued corresponds to a remote client session, it is assigned a thread to deliver the messages in its queue through the server transport. The server transport drains the server session message queue, converts the messages to JSON and sends them on the conduit as the payloads of transport-specific envelopes (for example, an HTTP response or a WebSocket message). Otherwise, the server session onto which the message is queued corresponds to a local session, and the messages in its queue are delivered directly to the local session.

  8. For both broadcast and non-broadcast messages, a reply message is created, passes through BayeuxServer extensions and ServerSession extensions (methods send() or sendMeta()). It then passes to the server transport, which converts it to JSON through a JSONContext.Server instance (see Section 7.5.1, “JSONContext API”), and sends it on the conduit as the payload of a transport-specific envelope (for example, an HTTP response or a WebSocket message).

  9. The envelope travels back to the client, where the client transport receives it. The client transport converts the messages from the JSON format back to message objects, for the Java client via a JSONContext.Client instance (see Section 7.5.1, “JSONContext API”).

  10. Each message then passes through the extensions (methods send() or sendMeta()), and channel listeners and subscribers are notified of the message.
The round trip from client to server back to client is now complete.

AsyncHTTP, Comet

HTTP is 1 way and stateless protocol. In order to get the real time updates, we have to use the polling. The article from Jetty list down the cost for the polling. It is huge. But lucky, we got the Comet and Async Servlet3.0.

the article from IBM has very comprehensive introduction on the various Comet solutions, such as polling, long-polling and streaming.

AJAX polling problem

Refer to original for the detail:
But there is a new problem. The advent of AJAX as a web application model is significantly changing the traffic profile seen on the server side. Because AJAX servers cannot deliver asynchronous events to the client, the AJAX client must poll for events on the server. To avoid a busy polling loop, AJAX servers will often hold onto a poll request until either there is an event or a timeout occurs. Thus an idle AJAX application will have an outstanding request waiting on the server which can be used to send a response to the client the instant an asynchronous event occurs. This is a great technique, but it breaks the thread-per-request model, because now every client will have a request outstanding in the server. Thus the server again needs to have one or more threads for every client and again there are problems scaling to thousands of simultaneous users.

Web 1.0
Web 2.0 +
Web 2.0 +
Comet +
Burst period (s)
Request Duration (s)
Poll Duration (s)

Request rate (req/s)
Poll rate (req/s)
Total (req/s)

Concurrent requests
Min Threads
Stack memory

HTTP, WebSocket, SPDY, HTTP/2.0 Evolution of Web Protocols

A very comprehensive doc on the HTTP technical evolvement.