Difference between revisions of "Unit Testing"

Latest revision as of 10:35, 26 October 2015

This page will track the effort to improve unit test and other automated test coverage for OpenSSL. Everyone is encouraged to help with this effort, but this page will track which community members have accepted specific responsibilities, and what tasks are currently underway.

Goals[edit]

The goals of this project are:

Establish a written contribution/code review policy that requires:
- tests for every new code change, except for very trivial changes that do not directly impact program logic (e.g. documentation fixes, string constant updates, build script tweaks); this can be limited to an agreed-upon subset of the code to start, but would hopefully be expanded to cover most, if not all, of the code base
- tests for every bugfix
- smaller, well-tested changes building up to a complete feature (as opposed to large and/or untested changes that implement a large, complex feature all at once)
Establish a schedule of Build Cops and Tutors to begin enforcing rollbacks/fixes and to help shepherd new contributors through the testing process.
Set up a series of publicly-accessible continuous integration servers using whatever system the community prefers (e.g. Jenkins, Buildbot, etc.). These can be dedicated hardware or possibly Amazon Web Services-based. Establish a written policy that these remain green at all times, and that breakages be rolled-back or fixed within no more than 30 minutes.

We can add more concrete goals to this list after we make progress on these. Getting people to own roles, setting up build infrastructure, and setting and enforcing policy are important first steps. Upon that foundation, we can begin to make serious progress towards improved test coverage, code quality, and defect prevention.

Discussion List[edit]

openssl-testing is the mailing list dedicated to the unit/automated testing effort, to avoid clogging openssl-dev. Membership is open to whoever wishes to join, even if only to lurk. Ideally all testing discussions will eventually move to openssl-dev once we have processes, tools, conventions, etc. in place.

Team[edit]

We can define new roles as needed. The success of this effort will be inversely proportional to the number of roles with any one individual's name next to them. Each role can have more than one person filling it, but needs specific people to fill them.

The point is not to tell people what to do and not do: it’s to establish clear responsibilities so everyone can run with them right away and begin applying their creative energy to solutions, rather than waste it figuring out what the hell is going on, what they should do to contribute, and who they need to coordinate with to get things done.

Organizer[edit]

Contact: Mike Bland

Ensures the overall effort is making progress by facilitating communication, removing obstacles, and making hands-on contributions when necessary.

Tech Writer[edit]

Contacts: Mike Bland, Sandeep Mankar, Chelsea Komlo, Graham Brooks, Lisa Carey

Maintains documentation of tools, techniques, outstanding issues, etc. May maintain a code review/testing style guide based on the community consensus. Also documents who currently fills each role, as well as who might've filled it in the past.

Buildmaster[edit]

Contacts: Mike Bland, Isaac Truett, Dileep Bapat, Graham Brooks

Sets up and maintains automated builds and any infrastructure used to monitor them and report breakage.

Toolsmith[edit]

Contact: Tony Aiuto

Maintains any tools and frameworks that are a part of the core build process; proposes, integrates, and/or retires existing tools.

Tutor[edit]

Contacts: Mike Bland, Darshan Mody, Tom Francis, Brian N. Makin, Edward Knapp, Dileep Bapat, Graham Brooks

Helps contributors, particularly new contributors, with the details of writing unit or other automated tests. May be assigned by the reviewer of a patch to help a contributor get a new change under test. Submission of the patch should require the approval of the tutor in most cases. (Should ideally be more than one person.)

Tasks[edit]

The current list is posted in the OpenSSL Testing Tasks Google Spreadsheet.

(Would be nice to enable the MediaWiki Widget extension and the Google Spreadsheet widget.)

Howtos and Style Guides[edit]

Documents that provide background on specific testing issues and idioms should be linked from here.

Discussion Topics[edit]

These are discussions related to some of the current testing tasks.

First Area of Focus[edit]

Matt Caswell: One criterion might be ease of testing. If you were going down that route you might choose some of the low level packages within libcrypto which are relatively self-contained. I'm not talking about the crypto algorithms themselves (there are at least *some* tests for those), I'm thinking more about bio, pem, x509, x509v3, etc.

Another criterion might be where you get the most value. I suspect we will get the most value from a really good set of unit tests for libssl. I suspect this might be a big ask to start off with (you would probably have to mock out a lot of stuff).

Build Systems/Machines[edit]

Where can we get the machines? Is someone willing to donate them? What system should we use?

Get Unit Tests to Build on Windows[edit]

Steve Henson on building ssl/heartbeat_test.c on Windows: The first problem is that __func__ isn't understood. Changing that to __FUNCTION__ resolves that easily enough.

The second problem is more serious. Windows (and a couple of other platforms IIRC) is strict about only exporting approved symbols from global headers in shared libraries and wont export internal only functions.

That means you get linker errors:

heartbeat_test.obj : error LNK2019: unresolved external symbol _ssl3_setup_buffe rs referenced in function _set_up heartbeat_test.obj : error LNK2019: unresolved external symbol _ssl_init_wbio_bu ffer referenced in function _set_up heartbeat_test.obj : error LNK2019: unresolved external symbol _tls1_process_hea rtbeat referenced in function _set_up_tls heartbeat_test.obj : error LNK2019: unresolved external symbol _dtls1_process_he artbeat referenced in function _set_up_dtls out32dll\heartbeat_test.exe : fatal error LNK1120: 4 unresolved externals NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 11.0 \VC\BIN\link.EXE"' : return code '0x460'

Potential solutions:

ifdef'd out (for now): https://github.com/openssl/openssl/commit/7ce79a5bfdbcd53ae75f85e94eed665a05b5dea3

How to Manage Private Symbols[edit]

Steve Henson: As this also affects other platforms a general solution might be in order.

One option is to link to static libraries as has been mentioned. If you have a lot of unit tests you could end up with a lot of large binaries. Would need some changes to the dreaded Windows build system too.

A different option would be to export the unit test functions through a structure pointer which a function returns. So you'd have something like..

unit_test = SSL_get_unit_test_funcs();

unit_test.some_internal_funcction(...);

Where the actual structure is opaque to applications (so they aren't tempted to poke around in internals) but visible to the unit test code. Then the unit test code doesn't call any internal functions not defined in there and doesn't need to load private headers (unless it needs them for some definitions).

Mike Bland: This is an interesting idea, though I'd expect the unit tests will still need access to internal headers.

But I have a question: Why do any of the symbols need to be private? Is that degree of encapsulation necessary, and does it really discourage irresponsible clients? The source code is open, so people can always build their own copy and depend on internals if they really want to, and the default UNIX/Linux/OS X build (i.e. ./configure && make && make test) doesn't lock down internal symbols. Have dependencies on internal APIs ever been a problem in practice?

I'm not saying I feel strongly either way, but it seems that clients should be clear that the public API is what's available in openssl/include, and shouldn't depend on anything else. I just don't understand what making symbols private buys anyone in the context of OpenSSL, in light of how it appears to complicate the ability to unit test.

Not trying to be argumentative, I just want to understand the rationale. I'll work with whatever environment I have to.

Tom Francis: Making the symbols private allows the link editors on Windows and pretty much every UNIX system that doesn't use GNU ld to: 1) Run much, much faster at link time (not a minor consideration for most of the software I've worked on); and 2) optimize the final binary to load much faster. Last time I checked, there was a minor improvement for GNU ld for #1, but it's trivial compared to Windows and HP's link editor, e.g. (I can't overstate the difference there). The runtime improvement is also dramatic on Windows, HP-UX, and AIX (we're talking tens for milliseconds for libraries that are much smaller than OpenSSL -- never tried it with OpenSSL, too lazy to mark everything, after having just done so for the library I was really interested in).

If you want to avoid a bunch of small binaries, you could combine all the unit tests into a single test application, and use multiple runs of it for each test, similar to how the openssl binary is created; if you link the object files directly (or a static library), you avoid the private symbols issue. Of course, this may be an issue for some of the embedded systems, where the loading a large binary for running the tests might not be acceptable (but might be preferable to a bunch of binaries?). Mike Bland was concerned about parallelization, and for any kind of automated tests on commits, that's a big point, if you can guarantee that each system only compiles/executes a limited number of tests (it could be useful for OS vendors, too). For the average OpenSSL user, that's not really a consideration; an average user is more likely concerned with how long (wall clock) it takes to compile and run the tests, and depending on the size of a single binary v the size of each smaller binary, it'll vary from system to system (generally, it'll be many times faster to create a single large binary than a few hundred smaller binaries, but the larger the binary, the longer it takes to load -- at some point, the load time might well outweigh the hundreds of smaller binaries (assuming you execute the large binary several times, rather than have a method for it to execute all tests in a single go). That point is probably > 100MB for larger systems, but for an embedded system, it could be much smaller.

For Steve's second suggestion, I think my biggest concerns would be how to logically separate out a unit test method without including the test framework (maybe throw it all in there?). I kinda like the idea of keeping the test inside the libraries, but I'm not sure why -- maybe 'cause I recently spent a few months back in a java world, where it's normal to throw all your unit tests into the same jar for production. I'm not sure I like that, though. The overhead of a single function per "module" added to the exports is also unlikely to drastically decrease link-time or run-time performance, too (compared with exporting everything).

Improved Patch/Cherry-pick Process[edit]

Geoff Thorpe: (a) patches that are not applied/cherry-picked to all the branches they apply to, or (b) applied to more branches than they should be. A complication of (a) is that some patches may need to be reworked in order to apply (or in order to not break binary compatibility of a stable branch). The difficulty of (b) is determining when the nature of the change either breaks binary compatibility or causes some other behavioral deviation that is not in keeping with the "promises" of stable branches.

Geoff on 2014-05-30: ...we might move to a scheme whereby we only maintain one most-recent stable branch for production releases, with snapshots or whatever of the development head. Ie. two branches, one for maximum-compatibility/bug-fixes-only, and the other for more active development. The number of "stable" branches would thus be 1 rather than 'n'. The idea is that we may defer to downstream "slow-moving" users (eg. long-term releases of linux distros) the opportunity to step up and do the work of maintaining (cherry-picking, back-porting, ...) older branches. Ie. what they typically already do in-house, but the opportunity to do it under our roof and thus combine their efforts with any other similarly-interested parties. If there's noone prepared to do that, the logical conclusion is that there is probably no need for it (or so the argument goes).

Improve/Standardize Algorithm Testing[edit]

Steve Henson: While the algorithms are partly tested I don't think they're anything like comprehensive enough. The way they are handled is also haphazard and spread across multiple tests covering all sorts of odd APIs, some of which are deprecated.

In fact I'd say they're (with the odd exception) an example of how *not* to do things.

Examples rc2test tests RC2 it has various test vectors hard coded inconsistent indentation and use of low level APIs. The rc4test is a similar story and uses a different format to rc2test. The same tale is repeated all over the place with ideatest, rmdtest etc etc.

Then we come to the code which I think is closest to how things should be which is evp_test. It uses the standard high level EVP API the format is consistent (if undocumented) and adding new tests can be done simply by editing a text file.

It currently however only works for some algorithms and the text file format could be better. It currently doesn't handle public key algorithms or test variants of the same algorithm (e.g. AES-NI versus non AES-NI) either.

Develop SSL/TLS Stack and Other Tools[edit]

Steve Henson: The heartbeat bug is an example where you can perform some testing without a complete SSL/TLS stack. However looking through the code in heartbeat_test.c it uses a lot of functionality which would normally be strictly off-limits for an application.

Some bugs are far more subtle and harder to test. For example some involve doing *almost* everything right and then slipping in some illegal parameter or badly formatted message. If you want to do that in a standalone bit of code you need a near complete SSL/TLS stack.

I can think of a couple of ways to handle that. One is to have some kind of hooks in libssl so testing code can misbehave and send out illegal messages in controlled ways (I had a prototype "bad heartbeat" API which only added a couple of lines to libssl). An alternative would be to write some kind of tiny SSL/TLS code used only by the test programs and carefully tailored to permit such things.

@@ Line 1: / Line 1: @@
 This page will track the effort to improve unit test and other automated test coverage for OpenSSL. Everyone is encouraged to help with this effort, but this page will track which community members have accepted specific responsibilities, and what tasks are currently underway.
+== Goals ==
+The goals of this project are:
+* Establish a written contribution/code review policy that requires:
+** tests for every new code change, except for very trivial changes that do not directly impact program logic (e.g. documentation fixes, string constant updates, build script tweaks); this can be limited to an agreed-upon subset of the code to start, but would hopefully be expanded to cover most, if not all, of the code base
+** tests for every bugfix
+** smaller, well-tested changes building up to a complete feature (as opposed to large and/or untested changes that implement a large, complex feature all at once)
+* Establish a schedule of Build Cops and Tutors to begin enforcing rollbacks/fixes and to help shepherd new contributors through the testing process.
+* Set up a series of publicly-accessible continuous integration servers using whatever system the community prefers (e.g. Jenkins, Buildbot, etc.). These can be dedicated hardware or possibly Amazon Web Services-based. Establish a written policy that these remain green at all times, and that breakages be rolled-back or fixed within no more than 30 minutes.
+We can add more concrete goals to this list after we make progress on these. Getting people to own roles, setting up build infrastructure, and setting and enforcing policy are important first steps. Upon that foundation, we can begin to make serious progress towards improved test coverage, code quality, and defect prevention.
+== Discussion List ==
+[https://groups.google.com/d/forum/openssl-testing openssl-testing] is the mailing list dedicated to the unit/automated testing effort, to avoid clogging openssl-dev. Membership is open to whoever wishes to join, even if only to lurk. Ideally all testing discussions will eventually move to openssl-dev once we have processes, tools, conventions, etc. in place.
 == Team ==
 We can define new roles as needed. The success of this effort will be inversely proportional to the number of roles with any one individual's name next to them. Each role can have more than one person filling it, but needs specific people to fill them.
 The point is not to tell people what to do and not do: it’s to establish clear responsibilities so everyone can run with them right away and begin applying their creative energy to solutions, rather than waste it figuring out what the hell is going on, what they should do to contribute, and who they need to coordinate with to get things done.
-'''Organizer''': [mailto:mbland@acm.org Mike Bland]<br />
+=== Organizer===
+Contact: [mailto:mbland@acm.org Mike Bland]
 Ensures the overall effort is making progress by facilitating communication, removing obstacles, and making hands-on contributions when necessary.
-'''Tech Writer''': [mailto:mbland@acm.org Mike Bland], Sandeep Mankar, Chelsea Komlo<br />
+=== Tech Writer ===
+Contacts: [mailto:mbland@acm.org Mike Bland], Sandeep Mankar, Chelsea Komlo, Graham Brooks, Lisa Carey
 Maintains documentation of tools, techniques, outstanding issues, etc. May maintain a code review/testing style guide based on the community consensus. Also documents who currently fills each role, as well as who might've filled it in the past.
-'''Buildmaster''': [mailto:mbland@acm.org Mike Bland], Isaac Truett, Dileep Bapat<br />
+=== Buildmaster ===
+Contacts:  [mailto:mbland@acm.org Mike Bland], Isaac Truett, Dileep Bapat, Graham Brooks
 Sets up and maintains automated builds and any infrastructure used to monitor them and report breakage.
-'''Toolsmith''': Tony Aiuto
+=== Toolsmith ===
+Contact: Tony Aiuto
 Maintains any tools and frameworks that are a part of the core build process; proposes, integrates, and/or retires existing tools.
-'''Tutor''': [mailto:mbland@acm.org Mike Bland], Darshan Mody, Tom Francis, Brian N. Makin, Chelsea Komlo, Edward Knapp, Dileep Bapat<br />
+=== Tutor ===
+Contacts: [mailto:mbland@acm.org Mike Bland], Darshan Mody, Tom Francis, Brian N. Makin, Edward Knapp, Dileep Bapat, Graham Brooks
 Helps contributors, particularly new contributors, with the details of writing unit or other automated tests. May be assigned by the reviewer of a patch to help a contributor get a new change under test. Submission of the patch should require the approval of the tutor in most cases. (Should ideally be more than one person.)
-== Goals ==
-We can add more concrete goals to this list after we make progress on these. Getting people to own roles, setting up build infrastructure, and setting and enforcing policy are important first steps. Upon that foundation, we can begin to make serious progress towards improved test coverage, code quality, and defect prevention.
-* Establish a written contribution/code review policy that requires:
-** tests for every new code change, except for very trivial changes that do not directly impact program logic (e.g. documentation fixes, string constant updates, build script tweaks); this can be limited to an agreed-upon subset of the code to start, but would hopefully be expanded to cover most, if not all, of the code base
-** tests for every bugfix
-** smaller, well-tested changes building up to a complete feature (as opposed to large and/or untested changes that implement a large, complex feature all at once)
-* Establish a schedule of Build Cops and Tutors to begin enforcing rollbacks/fixes and to help shepherd new contributors through the testing process.
-* Set up a series of publicly-accessible continuous integration servers using whatever system the community prefers (e.g. Jenkins, Buildbot, etc.). These can be dedicated hardware or possibly Amazon Web Services-based. Establish a written policy that these remain green at all times, and that breakages be rolled-back or fixed within no more than 30 minutes.
 == Tasks ==
@@ Line 40: / Line 63: @@
 * [[How To Write Unit Tests For OpenSSL]]
 * [http://mike-bland.com/2014/06/05/pseudo-xunit-pattern.html The Pseudo-xUnit Pattern]
+* [[Testing and Development Tools and Tips]]
 == Discussion Topics ==
@@ Line 77: / Line 101: @@
 ifdef'd out (for now): https://github.com/openssl/openssl/commit/7ce79a5bfdbcd53ae75f85e94eed665a05b5dea3
+=== How to Manage Private Symbols ===
+Steve Henson: As this also affects other platforms a general solution might be in order.
+One option is to link to static libraries as has been mentioned. If you have a
+lot of unit tests you could end up with a lot of large binaries. Would need some
+changes to the dreaded Windows build system too.
+A different option would be to export the unit test functions through a
+structure pointer which a function returns. So you'd have something like..
+unit_test = SSL_get_unit_test_funcs();
+unit_test.some_internal_funcction(...);
+Where the actual structure is opaque to applications (so they aren't tempted to
+poke around in internals) but visible to the unit test code. Then the unit test
+code doesn't call any internal functions not defined in there and doesn't need
+to load private headers (unless it needs them for some definitions).
+Mike Bland: This is an interesting idea, though I'd expect the unit tests will
+still need access to internal headers.
+But I have a question: Why do any of the symbols need to be private?
+Is that degree of encapsulation necessary, and does it really
+discourage irresponsible clients? The source code is open, so people
+can always build their own copy and depend on internals if they really
+want to, and the default UNIX/Linux/OS X build (i.e. ./configure &&
+make && make test) doesn't lock down internal symbols. Have
+dependencies on internal APIs ever been a problem in practice?
+I'm not saying I feel strongly either way, but it seems that clients
+should be clear that the public API is what's available in
+openssl/include, and shouldn't depend on anything else. I just don't
+understand what making symbols private buys anyone in the context of
+OpenSSL, in light of how it appears to complicate the ability to unit
+test.
+Not trying to be argumentative, I just want to understand the
+rationale. I'll work with whatever environment I have to.
+Tom Francis: Making the symbols private allows the link editors on Windows and pretty much every UNIX system that doesn't use GNU ld to:
+) Run much, much faster at link time (not a minor consideration for most of the software I've worked on); and
+) optimize the final binary to load much faster.
+Last time I checked, there was a minor improvement for GNU ld for #1, but it's trivial compared to Windows and HP's link editor, e.g. (I can't overstate the difference there).  The runtime improvement is also dramatic on Windows, HP-UX, and AIX (we're talking tens for milliseconds for libraries that are much smaller than OpenSSL -- never tried it with OpenSSL, too lazy to mark everything, after having just done so for the library I was really interested in).
+If you want to avoid a bunch of small binaries, you could combine all the unit tests into a single test application, and use multiple runs of it for each test, similar to how the openssl binary is created; if you link the object files directly (or a static library), you avoid the private symbols issue.  Of course, this may be an issue for some of the embedded systems, where the loading a large binary for running the tests might not be acceptable (but might be preferable to a bunch of binaries?).  Mike Bland was concerned about parallelization, and for any kind of automated tests on commits, that's a big point, if you can guarantee that each system only compiles/executes a limited number of tests (it could be useful for OS vendors, too).  For the average OpenSSL user, that's not really a consideration; an average user is more likely concerned with how long (wall clock) it takes to compile and run the tests, and depending on the size of a single binary v the size of each smaller binary, it'll vary from system to system (generally, it'll be many times faster to create a single large binary than a few hundred smaller binaries, but the larger the binary, the longer it takes to load -- at some point, the load time might well outweigh the hundreds of smaller binaries (assuming you execute the large binary several times, rather than have a method for it to execute all tests in a single go).  That point is probably > 100MB for larger systems, but for an embedded system, it could be much smaller.
+For Steve's second suggestion, I think my biggest concerns would be how to logically separate out a unit test method without including the test framework (maybe throw it all in there?).  I kinda like the idea of keeping the test inside the libraries, but I'm not sure why -- maybe 'cause I recently spent a few months back in a java world, where it's normal to throw all your unit tests into the same jar for production.  I'm not sure I like that, though.  The overhead of a single function per "module" added to the exports is also unlikely to drastically decrease link-time or run-time performance, too (compared with exporting everything).
 === Improved Patch/Cherry-pick Process ===
-Geoff Thorpe: (a) patches that are not applied/cherry-picked to all the branches they apply to, or (b) applied to more branches than they should be. A complication of (a) is that some patches may need to be reworked in order to apply (or in order to not break binary compatibility of a stable branch). The difficulty of (b) is determining when the nature of the change either breaks binary compatibility or causes some other behavioral deviation that is not in keeping with the "promises" of stable branches.
+Geoff Thorpe: (a) patches that are not applied/cherry-picked to all the branches they apply to, or (b) applied to more branches than they should be. A complication of (a) is that some patches may need to be reworked in order to apply (or in order to not break binary compatibility of a stable branch). The difficulty of (b) is determining when the nature of the change either breaks [[Binary_Compatibility|binary compatibility]] or causes some other behavioral deviation that is not in keeping with the "promises" of stable branches.
-http://ispras.linuxbase.org/index.php/ABI_compliance_checker
-https://github.com/lvc/abi-compliance-checker/
 Geoff on 2014-05-30: ...we might move to a scheme whereby we only maintain one most-recent stable branch for production releases, with snapshots or whatever of the development head. Ie. two branches, one for maximum-compatibility/bug-fixes-only, and the other for more active development. The number of "stable" branches would thus be 1 rather than 'n'. The idea is that we may defer to downstream "slow-moving" users (eg. long-term releases of linux distros) the opportunity to step up and do the work of maintaining (cherry-picking, back-porting, ...) older branches. Ie. what they typically already do in-house, but the opportunity to do it under our roof and thus combine their efforts with any other similarly-interested parties. If there's noone prepared to do that, the logical conclusion is that there is probably no need for it (or so the argument goes).