What happens when we live in a world of big data?

We have three options:

- Scale Out
- Sample
- Stream

- Smart Firewalls
- Content Distribution
- Network Analysis
- Recommender Systems

All of these domains either use or could use streaming algorithms to gain an advantage.

- Streaming Estimator: data goes in estimates come out.
- Reservoir Sampling [Vitter 85]
- Bloom Filter: represents a streaming set.
- Count Min Sketch: a set with frequencies aka multiset.

- Parallel Random Number Generator
- Monte Carlo Simulation
- Reservoir Sampling to reduce load on your Cloud.
- Code at reservoir.go

- Sketches are a neat mathematical data structure Graham Cormode studies them heavily.
- Represent a vector
*v*with*f*(*v*) such that*f*(*v*) is much smaller than*v*. - If
*f*is a linear function i.e.*f*(*v*+*w*) =*f*(*v*) +*f*(*w*), good times! - Bloom Filters have this property for boolean field. Image: Peter Scott

@inproceedings{cormode2009forward, title={Forward decay: A practical time decay model for streaming systems}, author={Cormode, Graham and Shkapenyuk, Vladislav and Srivastava, Divesh and Xu, Bojian}, booktitle={Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on}, pages={138--149}, year={2009}, organization={IEEE} }