Transition probability matrix generation:[[BR]]
1. convert timestamp to serial number and then sort[[BR]]
2. divide area into grids[[BR]]
3. loop: for each taxiid, find current grid & next grid, fill into grid matrix (row is current grid #, column is next grid #)[[BR]]
4. get probability matrix by normalizing grid matrix 

----

Update generation:[[BR]]
1. set parameters, such as time period (1 day / 1440 mins), GUID (from 1 to 4000), grid number, longitude and latitude range[[BR]]
2. initialize the taxi matrix ( GUID | current grid | timestamp), assign current grid according to location matrix which keeps density information about the taxi location, set timestamp to 0 [[BR]]
3. in every min (timestamp + 1), check every GUID whether it generates update[[BR]]
4. compute destination grid number through transition probability matrix (convert to probability CDF matrix); if it is different from source grid number, then generate an update[[BR]]
5. write | event type | GUID | source | destination | timestamp | into output file[[BR]]


----
table TAXIDATA has all data loaded, table TAXI1 loads only the first data file[[BR]]

table contents are as below[[BR]]
CREATE TABLE TAXIDATA[[BR]]
([[BR]]   
ID NUMBER(10) CONSTRAINT TAXIDATA_ID NOT NULL, [[BR]]
TAXIID NUMBER(7), [[BR]]
LONGITUDE NUMBER(9,6),[[BR]] 
LATITUDE NUMBER(8,6), [[BR]]
SPEED NUMBER(3), [[BR]]
ANGLE NUMBER(3), [[BR]]
DATETIME TIMESTAMP(6),[[BR]] 
STATUS NUMBER(1), [[BR]]
EXTENDSTATUS NUMBER(1),[[BR]] 
REVISED NUMBER(1), [[BR]]
PRIMARY KEY(ID) )[[BR]]
TABLESPACE USERS;[[BR]]
----
[[Image(location(10K_100grids).jpg)]][[BR]]
The picture shows 10k entries chosen from the first data file. The covered area is longitude from 121.2 to 121.8 and latitude from 31 to 31.5. The area is divided into 10*10 grids, which is 100 grids in total.[[BR]]