Thursday, July 25, 2013

Google GAE Technical Points

0. Google GAE Architecture

GAE Physical Architecture and Beneficials, Global Cache

Google Components Combination, GAE + Compute Engines + Services


1. SQL or NoSQL: Google App Engine DataStore


Quote from Objectify:
https://code.google.com/p/objectify-appengine/wiki/Concepts?tm=6
Since the datastore is conceptually a HashMap of keys to entities, and an entity is conceptually a HashMap of name/value pairs, your mental model of the datastore should be a HashMap of HashMaps!

When you save() your entity, it gets stored somewhere in a gigantic farm of thousands of machines. In order to perform an atomic transaction, the datastore requires that all the data that is a part of that atomic transaction live on the same server. To give you some control over where your data is stored, the datastore has the concept of an entity group.

Remember the parent that is part of a Key? If an entity has a parent, it belongs to the same entity group as its parent. If an entity does not have a parent, it is the "root" of an entity group, and may be physically located anywhere in the cluster. All its child entities (and children of those children) are part of the same entity group and colocated in the same part of the datastore.

Note that an entity group is not defined by a class, but by an instance! Each instance of an entity class which has a null (or absent) parent defines the root of a separate entity group.

Why not store all your data with a common parent, putting it all in a single entity group? You can, but it's a bad idea. Google limits the number of requests per second that can be served by a particular entity group.

It is worth mentioning that the term parent is somewhat misleading. There is no "cascading delete" in the datastore; if you delete the parent entity it will NOT delete the child entities. For that matter, you can create child entites with a parent Key (or any other key as a member field) that points to a nonexistant entity! Parent is only important in that it defines entity groups; if you do not need transactions across several entities, you may wish to use a normal nonparent key relationship - even if the entities have a conceptual parent-child relationship.

Transaction Limitations
When you execute a datastore operation, you will either be in a transaction or you will not. If you execute within a transaction:
  • Each EG you touch via get/put/delete/query enlists that EG in the transaction.
    • You can enlist up to 5 EGs, but single-EG transactions are fastest
  • Queries must include an ancestor which defines the EG in which to search. You cannot query across EGs at all, not even in XG transactions.
  • There are some quirks at the low-level API: For example, get()s and query()s will see the datastore "frozen in time" and will not reflect updates even within the transaction. Objectify hides this behavior from you; subsequent fetches will see the same data seen (or updated) previously.

Transactionless
If you operate on the datastore without an explicit transaction, each datastore operation is treated like a separate little transaction which is retried separately. Note that batch operations are not transactional; each entity saved in a batch put() is effectively its own little transaction. If you perform a batch save entities outside of a transaction, it is possible for some to succeed and some to fail. An exception from the datastore does not indicate that all operations failed.

Original Author: Nikolas Goebel
http://www.sitepoint.com/sql-or-nosql-google-app-engine-part-2/

When to choose NoSQL

Google Datastore is to store the "dumb" data. It is not RMDS. It mainly addresses the huge data store/query.

  1. Can a single server provide the performance we need? Maybe by utilizing caching? In this case RDBMS is the way to go. Look for example into CloudSQL, AppEngine’s purely relational offering.
  2. Do you plan on growing a multi-application environment working on the same dataset? Depending on your volume, RDBMS might be what you need, because it separates the database layer very strictly from the application.
  3. Do you need Ad-Hoc queries? In terms of query flexibility, SQL is the clear winner.
  4. Do you require perfect consistency? Even though there are ways to achieve strong consistency in the Datastore, it’s not what it was designed for. Again, RDBMS is the better choice.
  5. Are you expecting millions of reads & writes per second? The Datastore provides automatic scaling to infinity and beyond, and it’s right there for you to use with AppEngine.
  6. Do you need a simple, scalable way to persist entities with variable attributes? Even though you’ll have to handle consistency and data aggregation yourself, the schemaless Datastore should be what you need. And it’s integrated right into AppEngine, so it’s the ideal choice for quick prototypes with changing entities.

Continuous Query A.K.A Update Notification

It also support the notification when data is updated, Prospective search. The GAE App then can monitor the data change and notify the client side accordingly.  




Performance Tips

1. User Cursor instead of "LIMIT : OFFSET" to get the limit rows, such as second 10 rows
2. Only select the necessary fields by using projection query
3. Use "GET" instead of "Fetch"

Datastore

Datastore uses 6 Bigtables to manage the data. refer to
https://developers.google.com/appengine/articles/storage_breakdown for detail

  • Entities table
This one Bigtable holds all entities for all App Engine applications
  • Index tables
    • EntitiesByKind 
    • EntitiesByProperty ASC
    • EntitiesByProperty DESC
    • EntitiesByCompositeProperty 
      • Custom indexes table
  • Id sequences table
    • ID sequences are used to generate numerical datastore IDs for both entities and custom indexes.

Blobstore

Insert / Retrieve / Edit, in bulk – Flexible

  • Direct access to Blob data in memory – Fast access to Blob data
  • 5MB in ~2

2. CloudSQL - RMDS, MySQL

MySQL isn't mixed with big data. When mentioned the big data, the scale is billions rows. In another words, if the data is billions rows, the first choice should be DataStore, not CloudSQL.

3. CloudStorage - File Storage

4. Task Queues - Asynchronies execution

There are 2 types, Push & Pull. Push is more like a asynchronies thread. Pull can allows you to execute the task on-premise, out of Google APE. 


5. Memcache

it is a another server. Each memcache access will involves network call. The normal latency will be 2 ~ 5 ms.


6. Instance Cache A.K.A Global Variable

It is much faster than the Memcache. It is GAE memory and same life span as the instance.

7. GAE Backend Instance

Configuration Limits

  • app: 5 backends
  • app: 10GB of backends
  • backend: 20 instances
  • backend: 10GB
  • 10GB combinations
    • B8x10
    • B4x20
    • B8x5 + B4x10
    • B8x5 + B4x5 + B2x10

API deadlines apply

  • urlfetch: 5s default, up to 10s 
  • datastore: 30s

Size limits apply

  • HTTP: 32MB requests
  • urlfetch: 1MB request, 32MB response
  • memcache: 1MB objects
  • Blobstore: 2GB objects, 1MB response
  • Mail: 10MB send/receive
  • Tasks: 100KB

No uptime guarantee

  • best-effort service
  • expect polite and hard shutdown
  • various causes
  • Examples
    • software bugs
    • hardware failures
    • emergencies

8. Socket API

Now GAE App can use socket API to send the message out, such as Apple Pushing Notification. So GAE App can send out the pushing notice to the 2 most popular platforms, iOS & Android.

9. MapReduce

Map

  • Input: user data
  • Output: (key, value) pairs
  • User code

Shuffle

Collates value with the same key
  • Input: (key, value) pairs
  • Output: (key, [value]) pairs
  • No user code

Reduce

  • Input: (key, [value]) pairs
  • Output: user data
  • User code

10. Channel & Feed API

Channel API is to push from App Engine to browser. Feed API is to push internet feeds to the browser through "PubSubHubbub".




11. Tools / Librarys

CloudPlatform Excamples
CloudPlatform Excamples

No comments:

Post a Comment