Purify Thyself

cyboryxmen · Post by **cyboryxmen** » July 2nd, 2018, 6:33 pm

All programmers should strive to use pure functions as much as possible. Input goes in as parameters, output goes out as the return value. The function should do nothing else but.

Code: Select all

int function ( const int input )
{
    return input * 2; // output
}

It's simple, elegant, clear, concise and more importantly, easier to test for bugs. By making the function do more than input and output, you increase the chance of running into unexpected side effects.

Code: Select all

std::string global_string;

void function_that_sets_the_global_string ( )
{
    global_string = "String set!";
}

int function_that_isnt_supposed_to_set_the_global_string ( )
{
    global_string = "I set it anyway! MUHAHAHAHAHAHAHAHAHAH!";
    return 0;
}

void function_that_relies_on_the_global_string_being_set ( const int some_value )
{
    if ( global_string != "String set!" )
    {
        throw std::wtf_bro { };
    }
}

void run ( )
{
    function_that_sets_the_global_string ( );
    int some_value = function_that_isnt_supposed_to_set_the_global_string ( );
    function_that_relies_on_the_global_string_being_set ( some_value );
}

People say that situations like these can be avoided by preventing these functions from changing global variables(or better yet: not have global variables to begin with). I'm honestly willing to extend that limitation to pointers and references to objects outside of the function.

Code: Select all

void do_not_do_this ( char* buffer )
{
    char set_string [] = "String set!";
    std::memcpy ( buffer, set_string, sizeof ( set_string ) );
}

std::string do_this_instead ( )
{
    return "String set!";
}

Input goes in; output goes out. You don't have to worry about any unseen side effects made to your code because there is none.

There are situations where large objects are hard to manage with pure functions.

Code: Select all

std::vector<int> sort ( const std::vector<int>& input )
{
    std::vector<int> output;

    //Sort input into output

    return output;
}

std::vector<int> input;
std::vector<int> sorted_input = sort ( input );

In this example, you're creating an unnecessary copy of input when you could've just sorted input directly. If that vector is thousands of ints long, that copy will become very expensive very quickly. Looking at this, you might be tempted to just make the function edit input directly.

Code: Select all

void sort ( std::vector<int>& input )
{
    //Sort input in place
}

std::vector<int> input;
sort ( input );

A compromise however can be made by using r-values to reuse input.

Code: Select all

std::vector<int> sort ( std::vector input )
{
    //Sort input in place

    return input; // return input as output
}

std::vector<int> input;
std::vector<int> sorted_input = sort ( std::move ( input ) );

Now you can still use pure functions but have it preform exactly the same as its impure counterpart!

Course, you can't make a program entirely with pure functions. Eventually, somebody needs to take the output that the pure function gave and show it to the outside world.

Code: Select all

auto output = some_func ( );
std::cout << "The output:" << output << '\n'; // Show our output to the world!

You can't get rid of these so called imperative functions completely. What you can do however is reduce their content by moving as much of their code to pure functions. Take this example:

Code: Select all

void print_grade ( const std::string& name, const std::vector<student_type>& list_of_students )
{
    for ( std::size_t index = 0; index < list_of_students.size ( ); ++i )
    {
        if ( list_of_students [ index ].name == name )
        {
            std::cout << "Grade: " << list_of_students [ index ].grade << '\n';
        }
    }

    throw out_of_range { };
}

We can use a pure function that returns an index to the student to reduce the content of this imperative function.

Code: Select all

std::size_t find_student_index ( const std::string& name, const std::vector<std::string>& list_of_students )
{
    for ( std::size_t index = 0; index < list_of_students.size ( ); ++i )
    {
        if ( list_of_students [ index ].name == name )
        {
            return index;
        }
    }

    throw out_of_range { };
}

void print_grade ( const std::string& name, const std::vector<std::string>& list_of_students )
{
    std::size_t index = find_student_index ( name, list_of_students );
    std::cout << "Grade: " << list_of_students [ index ].grade << '\n';
}

By separating your code base into imperative functions and pure functions and reducing the content and number of imperative functions in your program, it makes it easier to find bugs. Pure functions are easy to test. Just give them inputs and see if they return the right outputs. For imperative functions, you just have to run the program and then see if it runs. If the program fails because of an unexpected change in some variable, you'll know to only look at the imperative functions since they're the only ones that are capable of changing that variable in any way.

Programming like this takes some time to get used to but once you do, you'll thank yourself for putting in the effort. It's the best way of creating simple code that's very easy to test and debug!

MrGodin · Post by **MrGodin** » July 2nd, 2018, 11:39 pm

Nice info, defiantly makes a lot of sense. I would pass info into a function and make it do a whole lot of things it didn't need to, so i am guilty

.. Good read

Post by **chili** » July 6th, 2018, 2:14 pm

Good stuff. I will agree that pure functions are cleaner an more preferable when feasible.

A little tangential, but passing shit in instead of hardcoding references (i.e. in constructor) is good for test-driven dev as well as it allows for dependency injection.

Planet Chili

Purify Thyself

Purify Thyself

Re: Purify Thyself

Re: Purify Thyself