# CSE332S Lecture 10 ## Associative Containers | Container | Sorted | Unique Key | Allow duplicates | | -------------------- | ------ | ---------- | ---------------- | | `set` | Yes | Yes | No | | `multiset` | Yes | Yes | Yes | | `unordered_set` | No | Yes | No | | `unordered_multiset` | No | Yes | Yes | | `map` | Yes | Yes | No | | `multimap` | Yes | Yes | Yes | | `unordered_map` | No | Yes | No | | `unordered_multimap` | No | Yes | Yes | Associative containers support efficient key lookup vs. sequence containers, which lookup by position Associative containers differ in 3 design dimensions - Ordered vs. unordered (tree vs. hash structured) - We’ll look at ordered containers today, unordered next time - Set vs. map (just the key or the key and a mapped type) - Unique vs. multiple instances of a key ### Ordered Associative Containers Example: `set`, `multiset`, `map`, `multimap` Ordered associative containers are tree structured - Insert/delete maintain sorted order, e.g. `operator<` - Don’t use sequence algorithms like `sort` or `find` with them - Already sorted, so sorting unnecessary (or harmful) - `find` is more efficient (logarithmic time) as a container method Ordered associative containers are bidirectional - Can iterate through them in either direction, find sub-ranges - Can use as source or destination for algorithms like `copy` ### Set vs. Map A set/multiset stores keys (the key is the entire value) - Used to collect single-level information (e.g., a set of words to ignore) - Avoid in-place modification of keys (especially in a set or multiset) A map/multimap associates keys with mapped types - That style of data structure is sometimes called an associative array - Map subscripting operator takes key, returns reference to mapped type - E.g., `string s = employees[id]; // returns employee name` - If key does not exist, `[]` creates new entry with the key, value-initialized (0 if numeric, default initialized if class) instance of the mapped type ### Unique vs. Multiple Instances of a Key In set and map containers, keys are unique - In set, keys are the entire value, so every element is unique - In map, multiple keys may map to same value, but can’t duplicate keys - Attempt to insert a duplicate key is ignored by the container (returns false) In multiset and multimap containers, duplicate keys ok - Since containers are ordered, duplicates are kept next to each other - Insertion will always succeed, at appropriate place in the order ### Key Types, Comparators, Strict Weak Ordering Like `sort` algorithm, can modify container’s order ... ... with any callable object that can be used correctly for sort Must establish a **strict weak ordering** over elements - Two keys cannot both be less than each other (inequality), so comparison operator must return `false` if they are equal - If `a < b` and `b < c` then `a < c` (transitivity of inequality) - If `!(a < b)` and `! (b < a)` then `a == b` (equivalence) - If `a == b` and `b == c` then `a == c` (transitivity of eqivalence) _Sounds like definition of order in math_ Type of the callable object is used in container type - Cool example in LLM pp. 426 using `decltype` for a function - Could do this by declaring your own pointer to function type - But much easier to let compiler’s type inference figure it out for you ### Pairs Maps use `pair` template to hold key, mapped type - A `pair` can be used hold any two types - Maps use the key type as the 1st element of the pair (`p.first`) - Maps use the mapped type as the 2nd element of the pair (`p.second`) Can compare `pair` variables using operators - Equivalence, less than, other relational operators Can declare `pair` variables several different ways - Easiest uses initialization list (curly braces around values) (e.g. `pair p = {"hello", 1};`) - Can also default construct (value initialization) (e.g. `pair p;`) - Can also construct with two values (e.g. `pair p("hello", 1);`) - Can also use special `make_pair` function (e.g. `pair p = make_pair("hello", 1);`) ### Unordered Containers (UCs) Example: `unordered_set`, `unordered_multiset`, `unordered_map`, `unordered_multimap` UCs use `==` to compare elements instead of `<` to order them - Types in unordered containers must be equality comparable - When you write your own structs, overload `==` as well as `<` UCs store elements in indexed buckets instead of in a tree - Useful for types that don’t have an obvious ordering relation over their values UCs use hash functions to put and find elements in buckets - May improve performance in some cases (if performance profiling suggests so) - Declare UCs with pluggable hash functions via callable objects, decltype, etc. - Or specialize the `std::hash()` template for your type, used by default ### Summary Use associative containers for key based lookup - Ordering of elements is maintained over the keys - Think ranges and ordering rather than position indexes - A sorted vector may be a better alternative (depends on which operations you will use most often, and their costs) Ordered associative containers use strict weak order - Operator `<` or any callable object that acts like `<` over `int` can be used Maps allow two-level (dictionary-like) lookup - Vs. sets which are used for “there or not there” lookup - Map uses a `pair` to associate key with mapped type Can enforce uniqueness or allow duplicates - Duplicates are still stored in order, creating “equal ranges” ## IO Libraries ### `std::copy()` `std::copy()` http://www.cplusplus.com/reference/algorithm/copy/ Takes 3 parameters: - `copy(InputIterator first, InputIterator last, OutputIterator result);` - `[first, last)` specifies the range of elements to copy. - `result` specifies where we are copying to. Example: ```cpp vector v = {1, 2, 3, 4, 5}; // copy v to cout std::copy(v.begin(), v.end(), std::ostream_iterator(std::cout, " ")); ``` Some useful destination iterator types: 1. `ostream_iterator` - iterator over an output stream (like `cout`) 2. `insert_iterator` - inserts elements directly into an STL container (will be practiced in studio) ```cpp #include #include #include #include #include using namespace std; int main(int argc, char *argv[]) { if (argc != 3) { cerr << "Usage: " << argv[0] << " " << endl; return 1; } string input_file = argv[1]; string output_file = argv[2]; ifstream input_file(input_file.c_str()); ofstream output_file(output_file.c_str()); // don't skip whitespace input_file >> noskipws; istream_iterator i (input_file); ostream_iterator o (output_file); // copy the input file to the output file: copy(InputIterator first, InputIterator last, OutputIterator result); copy(i, istream_iterator(), o); cout << "Copied input file" << input_file << " to " << output_file << endl; return 0; } ``` ### IO reviews How to move data into and out of a program: - Using `argc` and `argv` to pass command line args - Using `cout` to print data out to the terminal - Using `cin` to obtain data from the user at run-time - Using an `ifstream` to read data in from a file - Using an `ofstream` to write data out to a file How to move data between strings, basic types - Using an `istringstream` to extract formatted int values - Using an `ostringstream` to assemble a string ### Streams Simply a buffer of data (array of bytes). Insertion operator (`<<`) specifies how to move data from a variable into an output stream Extraction operator (`>>`) specifies how to pull data off of an input stream and store it into a variable Both operators defined for built-in types: - Numeric types - Pointers - Pointers to char (char *) Cannot copy or assign stream objects - Copy construction or assignment syntax using them results in a compile-time error Extraction operator consumes data from input stream - "Destructive read" that reads a different element each time - Use a variable if you want to read same value repeatedly Need to test streams’ condition states - E.g., calling the `is_open` method on a file stream - E.g., use the stream object in a while or if test - Insertion and extraction operators return a reference to a stream object, so can test them too File stream destructor calls close automatically ### Flushing and stream manipulators An output stream may hold onto data for a while, internally - E.g., writing chunks of text rather than a character at a time is efficient - When it writes data out (e.g., to a file, the terminal window, etc.) is entirely up to the stream, **unless you tell it to flush out its buffers** - If a program crashes, any un-flushed stream data is lost - So, flushing streams reasonably often is an excellent debugging trick Can tie an input stream directly to an output stream - Output stream is then flushed by call to input stream extraction operator - E.g., `my_istream.tie(&my_ostream);` - `cout` is already tied to `cin` (useful for prompting the user, getting input) Also can flush streams directly using stream manipulators - E.g., `cout << flush;` or `cout << endl;` or `cout << unitbuf;` Other stream manipulators are useful for formatting streams - Field layout: `setwidth`, `setprecision`, etc. - Display notation: `oct`, `hex`, `dec`, `boolalpha`, `nobooleanalpha`, `scientific`, etc.