Simulating Missing Language Features (with Cheap Tricks)

If you have ever implemented “real-world” application logic in an uncommon or less advanced programming languages, you may have had to deal with the fact that features you take for granted in other languages are not available. When I encounter such obstacles I am often fascinated by the challenge of trying to add the missing functionality to the language in one way or another. Sometimes I will implement the feature “properly”, but often I find it easier and more convenient just to simulate the behavior I want. Over the years I have been playing around with a fair share of odd programming environments, so here I present you with a collection of some cheap tricks I have used to overcome language limitations in the past.

Simulating Hash Maps with Arrays

These days, any “respectable” language has some kind of support for hash maps or associative arrays, either in the language itself or via standard libraries. In Java there is the Map interface and the HashMap implementation, in C++ there is std::map ~~and std::hash_map~~ , while in Perl, PHP and Python, associative containers are a part of the language itself. But what do you do if you want the convenience of easy access to key/value pairs in languages that don’t provide any such features?

Of course, you could always set aside a couple of days to implement a real hash map, but unless you are trying to impress your university professor or performance actually matters to you, it will probably just be a waste of time. Implementing one may also prove rather difficult if your language lacks other “essential” features, like data records, structures or object encapsulation (i.e. classes).

However, most languages support arrays, and as it turns out, we can easily simulate a mapping container using two arrays:

function findIndex(array, value):
for index from 0 to size of array-1:
if array[index] equals value:
return index
return -1 // value not found

function setMapValue(keyArray, valArray, key, value):
index = findIndex(keyArray, key)
if index equals -1
append key to keyArray
append value to valArray
else
valArray[index] = value

function getMapValue(keyArray, valArray, key):
index = findIndex(keyArray, key)
if index equals -1
return nothing // key not found
return valArray[index]

Using the two utility functions setMapValue() and getMapValue(), we can now store and retrieve key/value pairs from our (simulated) map:

setMapValue(keyArray, valArray, "sortValue", 15)
setMapValue(keyArray, valArray, "refCount", 0)
...
sortValue = getValue(keyArray, valArray, "sortValue")
refCount = getMapValue(keyArray, valArray, "refCount")

It may look a bit dirty, but compared to “manually” manipulating array pairs spread around the code, I find this technique very useful when working in “less sophisticated” environments.

Simulating Arrays with Variables

Hash maps and associative containers are somewhat advanced data structures, so I don’t expect every home-grown programming language or proprietary third-party scripting engine to support them, but what about arrays? What if your language has no concept of a sequence of (related) values?

The answer is simple. Just simulate arrays using regular variables:

function setArrayValue(index, value)
if index equals 0
arrayVariable0 = value
return
if index equals 1
arrayVariable1 = value
return
if index equals 2
arrayVariable2 = value
return

function getArrayValue(index)
if index equals 0
return arrayVariable0
if index equals 1
return arrayVariable1
if index equals 2
return arrayVariable2
return nothing // index out of bounds

As the example illustrates, this technique requires a lot of typing and code duplication, and is therefore somewhat tedious to implement for large arrays. Also, your language may require variables to be defined before use, resulting in even more verbose code. Another disadvantage is that since the variable names are hard-coded, you need a separate function for each array and care must be taken to avoid name collisions between distinct arrays.

This may sound like a debugging nightmare waiting to happen, but it is also an excellent opportunity to utilize your favorite—hopefully more advanced—scripting language and write a simple source code generator, saving you all the trouble of extensive manual typing and hard-to-find copy/paste errors.

An interesting feature of these “manually implemented” arrays is that they will allow you to have arrays with holes in them, or arrays using arbitrary indices of your choosing. If your implementation language supports dynamic variable typing, you can even use this technique to simulate arrays of mixed data types.

Utilizing Dynamic Variable Names

If your language supports rune-time generation of variable names (often called dynamic variables or variable variables), the array implementation described above can be rewritten to something much simpler (variable variable syntax borrowed from PHP):

function setArrayValue(array, index, value)
variableName = array + index
${variableName} = value

function getArrayValue(array, index)
variableName = array + index
return ${variableName}

In fact, the same trick can also be used to simplify the “hash map” implementation as well, eliminating the need for two array parameters:

function setMapValue(array, key, value):
keyArray = array + "Key"
valArray = array + "Value"
index = findIndex(${keyArray}, key)
if index equals -1
append key to ${keyArray}
append value to ${valArray}
else
${valArray}[index] = value

function getMapValue(array, key):
keyArray = array + "Key"
valArray = array + "Value"
index = findIndex(${keyArray}, key)
if index equals -1
return nothing // key not found
return ${valArray}[index]

Note that dynamic variables are considered by many as one of the most dangerous features of (modern) scripting languages, so be careful not to shoot yourself in the foot. They remind me of the good old days of self-modifying assembly code: great fun for (obscure) hacks and very hard to debug when things go wrong.

Simulating Random Access to List Elements

Some languages support iteration over lists, but not direct access to elements via indices. If you want to “break the rules”, and index lists directly anyway, you can often achieve this with a custom function or macro—assuming the language provides you with a way to define those—like this:

function getValueByIndex(list, index)
i = 0
for each element in list
if i equals index
return i
return nothing // index out of bounds

Again, iterating over the entire list to find your element may seem a bit odd, but when your data set is small, or you don’t care about efficiency, this approach works surprisingly well.

Simulating Data Records with (Simulated) Hash Maps

Have you ever wanted composite data structures while working in a language that doesn’t support them? Well, I have, and I usually get my will anyway, one way or the other.

Using the map implementation from before, we can simulate a three-field Person record with ease:

setMapValue("person", "firstName", "Johnny")
setMapValue("person", "lastName", "Mnemonic")
setMapValue("person", "age", 37)

...

firstName = getMapValue("person", "firstName")
lastName = getMapValue("person", "lastName")
age = getMapValue("person", "age")

Also here we can use dynamic variable names to simulate more complex data structures, like an array of Person records:

function setPerson(array, index, firstName, lastName, age, occupation)
recordName = array + index
setMapValue(${recordName}, "firstName", firstName)
setMapValue(${recordName}, "lastName", lastName)
setMapValue($(recordName}, "occupation", occupation)

Unfortunately, unless your language supports multiple return values or pass-by-reference parameters, retrieving the data from the array will involve a bit more “manual labor”:

function printRecords(prefix, arraySize)
for index from 0 to arraySize-1
recordName = prefix + index
firstName = getMapValue(${recordName), "firstName")
lastName = getMapValue(${recordName), "lastName")
occupation = getMapValue(${recordName), "occupation")
print "Record #" + index
print "First name: " + firstName
print "Last name: " + lastName
print "Occupation: " + occupation

This may not be the best way to represent thousands of records from a collection of database tables, but if you only need to read a few fields from a configuration file or a convenient way to organize an internal data structure, this approach can make things just a bit more readable and maintainable—at least in situations where the alternative would be manually managing separate variables for each field in each record, which can be a real drag, both to implement and maintain.

Simulating Variables with …

No, really. You don’t want to do that.

Conclusion

Often when suggesting feature simulations like these to people struggling with language limitations, I get negative reactions. Their first response is usually something like “No, that’s ugly” or “No, we can’t do that. It will be too slow”. Sometimes these concerns are valid, but most of the time the assumptions are just wrong.

When it comes to ugliness, I will be the first to admit that tricks like these are simple and ugly, but as it turns out, they often get the job done, and if you are lucky, they may even make your code more readable and maintainable. As for speed concerns, if you are worried about speed you should not be using a language requiring hacks to support these features anyway. You would probably be better off chosing a different approach, preferably by using a “real” langauge. In fact, if you encounter problems like these, and struggle with language limitations, on a regular basis, chances are you are using the wrong language for your tasks.

roberth on January 9, 2009 at 16:01

There is no std::hashmap (or std::hash_map) in standard C++. There will come a std::unordered_map in the next version of C++, which some compilers ship as std::tr1::undordered_map. See http://www.boost.org/doc/libs/1_37_0/doc/html/unordered.html and its references.