Data represents perhaps an organization’s most critical asset. Applications that operate on, manipulate and store data must include well-managed security. Security of information such as personally identifiable information and corporate secrets needs to be guaranteed not only in transit but also in storage. Moreover, the passage of recent laws, such as California’s SB 1386 and the credit card industry’s PCI standards, also imply that protecting customer information is key to the survival of an organization. Hence, the lack of data protection can have devastating consequences as recent examples have shown.
Continuing in this second installment of our series of articles (See Software Magazine Fall Edition, “Do Configuration Management During Design & Development”), we now move to considering security code review as well as penetration testing of applications, both of which are used for threat modeling. Two major attack avenues exist for such application data — during transit and in a data store. Handling the risks associated with these avenues is not necessarily complex. Tried-and-tested solutions exist to provide solutions to this problem. However, the devil is in the details of the implementation, and data protection is easy to get wrong. Moreover, compliance requirements can often govern these details — the choice of algorithms, key lengths and other parameters. For instance, many government agencies demand Federal Information Processing Standards (FIPS) compliance, which automatically rules out a host of algorithms and functions as being weak. It is therefore important that developers dealing with such requirements not only be cognizant of them but have the tools necessary to achieve compliance.
The science and art of cryptography is often treated as a shiny red button that will solve all security problems. Countless times as a security review begins, we have heard, “We use SSL so we are secure.” This is indicative of both the complexity the topic encompasses as well as how little practitioners understand. On the positive side, though, with the advent of a number of software development frameworks such as J2EE and Microsoft .NET, access to data protection mechanisms has been made far easier, and developers have at their disposal implementations of best-of-breed algorithms and protocols as simple class and object methods.
Security Properties
Security 101 teaches you that some of the most critical security properties are confidentiality, integrity and availability — collectively called the CIA properties. Of these, cryptography most often makes confidentiality and integrity possible. Besides these two, secure systems often have to also deal with authentication and nonrepudiation1. Let’s start with what each of these properties mean and how they differ from each other:
Confidentiality: Achieving confidentiality means restricting information access to those privileged to see it. Network sniffing is an example of a violation of confidentiality. Other examples include disclosure through log files and exception and error messages.
Integrity: Data integrity means trusting that the information has not been altered between its transmission and reception. Source integrity is trusting that the information sender is who it is supposed to be, which is often termed “authenticity.” Data integrity can be compromised when information has been corrupted, willfully or accidentally, before is the intended recipient has read it. Source integrity is compromised when an agent spoofs its identity and supplies incorrect information to a recipient.
Authentication: This is the process by which an entity attempts to confirm that another entity that has sent information to the first entity is who it claims to be. Authentication is not to be confused with authorization or access control, which defines what an entity can do after it has been authenticated.
Nonrepudiation: This is also often known as accountability and is tremendously popular in the government and financial services sectors. The nonrepudiation of receipt of information means that an agent can’t deny receiving information. The nonrepudiation of sourcing information means that an agent can’t deny sending the information. Nonrepudiation is essentially intended to prevent an entity from denying a transaction took place. For instance, consider what would happen if malicious users could withdraw money from an ATM and then deny they did so.
Cryptography Primitives
Data protection is achieved by encapsulating a number of primitive building blocks into the final cryptographic solution such as SSL or digital signatures. From an application development perspective, making incorrect choices with respect to anyone of these primitives can undermine the entire solution’s security.
Random-Number Generation: Random-number generators are used in the generation of session identifiers, entropy for key generation and other cryptographic functions, as well as for application-specific purposes such as account identifiers and during password resets. It is important to distinguish, however, between cryptographically secure random numbers and those that are not. For instance, random numbers generated using the C rand function or Java’s Math.random are not cryptographically secure. These algorithms lack a key property that is available in classes such as SecureRandom in Java and RNGCryptoServiceProvider in .NET.
These latter classes are often called pseudo random number generators (PRNGs), and they share sequences of generated numbers that are approximately independent of each other. These pseudorandom numbers are usually generated through the use of specialized hardware measuring physical phenomena such as radioactive decay. Pseudorandom numbers have periodicity, which implies that after a given interval the output will be repeated, and from that point on, the algorithm will begin repeating the sequence of random numbers. The best PRNGs are therefore those that have an extremely large period. Associated with this process is also a secret value called a seed. The seed represents the starting point in the sequence. Disclosure of this value can result in the compromise of the randomness properties. Bugs surrounding improper use of random numbers have affected the best of applications and protocols including early versions of the Secure Sockets Layer (SSL) protocol.
Hash Functions: Also called one-way functions or checksums, hash functions basically operate on arbitrarily long data, reducing it to some finite-length fingerprint. For instance, the popular (but now weak) Message Digest algorithm (MD5) function produces a 128-bit long hash, while the Secure-Hash 1 algorithm (SHA1) generates a 160-bit output. The best hash functions have three key properties:
1. One way: Hash functions must be one way to be effective. That is, it should not be possible to reverse a hash function: to go from the output hash value to the original input. This function is extremely useful and is most popularly used for password storage purposes so that even if the password database is compromised, all is not lost. This property is often called preimage resistance.
2. Second-order preimage resistance: Given a specific input, this property should be highly computationally complex to find another input that generates the same hash as the first.
3. Collision resistance: It should be computationally hard to find two random inputs that generate the same hash. In recent months this specific property has been in the news quite a bit2, as people have questioned the effectiveness and use of algorithms such as MD5. In reality, just based on the definition given here, it is fairly obvious that collisions will result since the input set is infinitely bigger than the output. The key aspect of a good hash algorithm, however, is to make these collisions difficult to find. As a rule of thumb, the best hash functions operate so that changing even a single bit in the input alters roughly half the bits in the output.
With these key properties in mind, application developers would do well to stay ahead of the curve and thus avoid MD5 entirely and begin to migrate away from SHA1 as well toward stronger algorithms such as SHA256.
Even though hash functions are one way, if the attacker can steal the password database, he or she can still launch an offline attack by computing the hash for every possible password. This, however, is a computationally intensive task. Attackers, however, have attempted to gain the upper hand in this battle by precomputing the hashes and distributing these as rainbow tables3. Thus, when you launch an offline attack, the task is now as simple as performing a simple database lookup. To deal with this, most effective authentication mechanisms introduce a cryptographically random source of entropy or a “salt” value. This salt is combined with the actual password before hashing it. As a result, the attacker now needs to compute not just the hash of every possible password but the hash of every possible password with every possible salt value. This adds significant complexity to the process. Obviously however, the system must store the salt value used for each account so that it can authenticate the user.
Finally, hash functions should not be confused with encryption. Hash functions are one way and keyless, whereas encryption functions typically have decryption ability and usually leverage a key. Thus, hashing should not be used to protect confidentiality but are an effective mechanism for integrity checking. Hash functions are often combined with keys to produce what are called HMACs (Keyed Hashing for Message Authentications). However, in these cases they only function to perform authentication and do not by themselves guarantee confidentiality.
Encryption and Signatures: Encryption and signatures are often mentioned in the same breath since, they are not that different even though they attempt to achieve different security properties. Essentially, both of these represent functions that take a piece of plain text-input data and a cryptographic key and then generate an output. Developers cannot reverse this output, often called the cipher text, back into the plain text unless they know both the algorithm used and the key. This also implies that the choice of the algorithm and the key are critical.
Application developers must never attempt to build their own cryptographic algorithms but should rely on implementations such as RSA and AES. Those algorithms have been through tremendous public and expert scrutiny and have stood the proverbial test of time. Moreover, creating a strong, effective cryptographic algorithm is no easy task. It requires in-depth knowledge of fields such as probability and number theory. Similarly, application development teams should refrain from using so-called secret cryptography sold by many vendors. Such implementations, often referred to as snake oil by the security community4, can cause more harm than good since they are not known to be secure.
Encryption algorithms are typically classified as symmetric or asymmetric. As the name suggests, symmetric algorithms use the same algorithm and key to both encrypt and decrypt the data. Thus, for two entities to share information using symmetric cryptography, they both need to agree on a specific algorithm and then share a key. Symmetric encryption algorithms are typically much faster than asymmetric implementation. Therefore, they are preferred for bulk encryption. But they have a significant disadvantage: the problem of key distribution. If multiple entities are looking to communicate among themselves with each pair maintaining their own confidentiality (and/or integrity), each needs to share a different key. To distribute these keys, you need a secure channel in the first place, so symmetric key implementations can often become a chicken-and-egg problem.
Asymmetric algorithms use a different key (usually called the public key) to encrypt and a different one (called the private key) to decrypt. Thus, if Alice wants to send Bob a confidential message, she first encrypts it with Bob’s public key, which Bob can publicly broadcast on his Web site or through some kind of key directory, which is the central component in any public key infrastructure (PKI). Once this data has been encrypted, it can only be decrypted by the corresponding pair; Bob’s private key, for example. This type of algorithm thus solves the key distribution problem described above. However, due to their very nature, asymmetric implementations are much slower than symmetric algorithms. As a result, they are not suitable for bulk encryption during, for instance, an SSL session.
You create digital signatures using a very similar process. In our example, if Alice wants to digitally sign a piece of data before sending it to Bob, she “encrypts” the data with her private key to generate the cipher text. This cipher text represents the digital signature and can be verified using Alice’s public key, which is widely available. Assuming Alice keeps her private key a well-guarded secret, as she should, no one else can generate the same signature. As such, it is an excellent mechanism for authentication and integrity protection. In practice, to prevent the signature from being as large as the data itself, the data is hashed before being “encrypted” using the private key.
As you would expect, encryption and signatures can be combined. Developers are typically advised to first encrypt and then sign the data they are looking to keep confidential. Developers must bear in mind issues such as surreptitious forwarding and repudiation, which are beyond the scope of this article5. Similarly, an appropriate choice of block cipher modes is also critical. For instance, modes such as the electronic code book (ECB) are known to be insecure and can result in information disclosure, as shown in the figures below6. Stronger modes such as cipher block chaining (CBC) are always preferable.
For symmetric algorithms, developers and architects are strongly advised to avoid using the Data Encryption Standard (DES). Instead, they should use a minimum of 3DES and, as far as possible, the Advanced Encryption Standard (AES/Rijndael). For asymmetric algorithms, RSA is currently the most commonly used. However, public key algorithms should typically only be used for onetime activities such as key exchange or session setup.
Key Management
As suggested, the effectiveness and utility of a cryptographic implementation are primarily based on the secrecy of the key. As such, the security keys must clearly be paramount during every stage of their lifetime. Three of these critical stages consist of the following:
Key Generation: Keys must be generated with properties that accentuate their ability to provide the security desired in the overall cryptographic implementation. Two of the main properties are length and entropy. Key length can be the bane of many applications leveraging cryptography. Short keys are far easier to guess and than longer ones. (The seminal work on appropriate key lengths was done by researchers Arjen K. Lenstra and Eric R.Verheul in 19997). To summarize their recommendations, the minimum length of a key varies over time based on computing power improvements, cryptanalytic research and increasing budgets of attackers. Based on that analysis the following table can be constructed:
|
Year |
Minimum
Symmetric
Key Length |
Minimum
Asymmetric
Key Length |
|
|
1982
|
56 |
417 |
|
|
1987 |
60 |
539 |
|
|
1992 |
64 |
682 |
|
|
1997 |
68 |
844 |
|
|
2000 |
70 |
952 |
|
|
2001 |
71 |
990 |
|
|
2002 |
72 |
1028 |
|
|
2003 |
73 |
1068 |
|
|
2004 |
73 |
1108 |
|
|
2005 |
74 |
1149 |
|
|
2006 |
75 |
1191 |
|
|
2007 |
76 |
1235 |
|
|
2008 |
76 |
1279 |
|
|
2009
|
77 |
1323 |
|
|
2010 |
78 |
1369 |
|
|
In short, if you want your data to be secure through the year 2010, for instance, you must choose symmetric keys to be at least 78 bits long (128 bits in practice) and asymmetric keys to be at least 1,369 bits in length (2,048 bits in practice).
As is implied above, development teams must distinguish between long-term and short-term, or ephemeral, keys. The latter are very commonly used as session keys in protocols such as SSL and IPsec (IP security). Given their short lifetime, they do not need to be as long as long-term keys. They are often used for encryption and authentication of sensitive data in storage rather than just on the wire. With long-term keys, developers must also be concerned about how to perform key rotation and revocation if keys get compromised. In such cases, it is best to rely on a PKI system or trusted certification authority–based system such as the one used for digital certificates.
Key entropy is a critical ingredient in effective cryptography. Unlike passwords, keys should not be easy to remember. They should be cryptographically random seeded from a passphrase if required8. This implies that developers should not attempt to create a key by using just any sequence of letters, numbers and special characters or, worse still, attempt to use the password as the key. Instead, developers should rely on functions available as part of the underlying framework. For instance, each of the CryptoServiceProvider classes in the .NET base class library such as RijndaelCryptoServiceProvider has a GenerateKey method as well as a GenerateIV method for the initialization vector. Similarly, Java has the javax.crypto.KeyGenerator class. The Data Protection API (DPAPI) available on Microsoft Windows 2000 and above is an excellent symmetric algorithm where key management is handled by the operating system. With this API the developer doesn’t have to deal with any keys; instead, the key is generated based on the logged-on user’s password hash. This mechanism has one drawback, however: It is not easily portable across machines.
Key Distribution: Once the keys have been generated, they often need to be distributed to the responsible parties involved in the communication. As discussed, this is not always an easy task. Two of the biggest threats to a cryptographic system are key disclosure and tampering to cause a denial of service. While asymmetric key algorithms make the distribution task trivial, they are not best suited for bulk encryption like their symmetric counterparts. In practice, we try to get the best of both worlds by using asymmetric key algorithms to first exchange a shared session key, which is then used for the actual bulk encryption with a symmetric algorithm. This, in fact, is the approach used by SSL.
As a rule of thumb, the key distribution, if needed, should as far as possible take place offline using some out-of-ban mechanism (like over the phone). The application must ensure that key material is transmitted over a secure channel such as one protected using SSL or IPSec. Only well-documented and established key exchange algorithms such as Diffie-Hellman (DH) or RSA should be used when exchanging keys online. Even in such cases, one must be careful to avoid known flawed implementations such as anonymous Diffie-Hellman (ADH), which is susceptible to a man-in-the-middle (MITM) attack.
Key Storage: This is probably the number-one source of key compromise in applications. The most common mistake is to store the key within the source code which uses that key. This is especially risky in managed languages such as Java and C#, which can easily be disassembled into higher-level language code that will quickly indicate the key being used. Keys stored in the source code are also often kept constant for the application’s lifetime and are not under controls that enable key rotation and revocation. This is often true across development and deployment where the same key is used during live production that was used during testing. Similarly, configuration files are also not the best location to save secrets such as keys. These files are typically accessible to the application and to developers and administrators. The keys are thus available to a very large audience. Moreover, any Web vulnerability that provides an external user with access to the configuration files is now exacerbated.
With regard to secure key storage, consider first whether or not the keys need to be stored in the first place. For instance, ephemeral keys, by definition, shouldn’t be stored. If they must be stored for some reason, consider a hardware solution such as a smart card or a Universal Serial Bus (USB) stick. Many of these devices are built to be tamper resistant and to support advanced options such as splitting the key into fragments that are stored separately. When storing in software, all commonly used cryptographic frameworks such as Microsoft’s CAPI support the notion of key stores. In the Java SDK for instance, the java.security.KeyStore class and the keytool command-line utility both provide access to a protected repository for storing key material. Similarly in the .NET world, one can rely on either the user or the machine key store.
Transit Security
Even the best storage security can be undone by secrets inadvertently exposed over the wire. This could include, for instance, authentication cookies and tickets being transmitted between the browser and the server in the clear (not encrypted). Fortunately, fairly well-understood and easy solutions deal with this issue. However, like a lot of the other topics discussed in this article, the solution is also easy to get wrong, and the default options may not always be the correct ones. With transit security, this issue is even more pronounced, given that these options are often handled by an entirely different team – typically, IT administrators — as opposed to being a function of the development team itself. As we discussed in the first installment of this article, it is critical that both of these groups communicate effectively and often to ensure security all around.
Most transport security mechanisms rely on establishing a secure communication tunnel between the two entities. All the security properties previously discussed are then enforced on this tunnel or envelope rather than directly on the data. While this may not appear as granular as one would like, it allows for the flexibility to layer this approach over a varied set of data protocols such as HTTP, SMTP and FTP. Two of the most common protocols are as follows:
SSL: Most developers, especially those that build Web applications, are intimately familiar with the Secure Sockets Layer (SSL). This is often treated as a silver bullet. However, SSL is only intended to provide transport security. At a protocol level, SSL lets developers tunnel sensitive data through a protected channel that guarantees confidentiality or integrity and/or authentication. In its most common form, SSL uses server digital certificates or X.509 certificates as they are also known9. However, SSL also supports client-only authentication as well as mutual authentication with either the client only or both entities presenting their digital certificates.
While SSL is a tremendously flexible yet transparent solution, it requires some configuration in order to be set up correctly. Primarily, this centers around selection of a cipher suite. As part of the initial handshake, both entities must agree on a specific cipher suite they will use for the session’s duration. If the parties to the communication cannot agree on this, an SSL session cannot occur.
The components of a cipher suite include the choice of algorithms and key lengths for the initial key exchange, authentication, encryption and message integrity protection. For instance, the cipher suite DHE-RSA-AES256-SHA uses ephemeral Diffie Hellman for the key exchange, RSA for authentication, AES with a 256-bit key for encryption and SHA1 for message integrity protection. AES256-SHA uses RSA for both key exchange and authentication, AES(256) for encryption and SHA1 for integrity protection. As one might expect, all the guidance provided earlier in this article with respect to key exchange, asymmetric and symmetric algorithms and key lengths as well as hash functions must be applied to the choice of cipher suite. This is especially true since, by default, many supported cipher suites use weak encryption such as DES with a 56-bit key or flawed key exchange such as ADH or even no encryption in the so-called NULL cipher suite.
IPSec: While SSL is designed to provide security at the transport layer, IPSec takes that one step further by moving to the network layer itself. IPSec is essentially the security extensions to the Internet Protocol (IP), which enables encryption, authentication and integrity protection to network streams by encrypting or authenticating all IP packets.
IPSec is a set of cryptographic protocols for both securing packet flows as well as key exchange. Packet flows can be secured with either the Encapsulating Security Payload (ESP), which provides authentication, data confidentiality and message integrity, or the Authentication Header (AH), which provides authentication and message integrity without confidentiality. The IPSec standard also defines a key exchange protocol .as the IKE (Internet Key Exchange) protocol.
IPSec is not as flexible or widely supported as SSL. But it is tremendously useful, especially in securing server-to-server communication. For instance, IPSec is ideal for locking down communication between the application server and database server or between the client and server when using .NET remoting. This can ensure for instance, that a random machine on the same network cannot connect to the database simply because it is accessible. Instead, the caller is challenged to authenticate itself, and policies can be set up on the database server to allow only connections from the application server. In situations such as .NET remoting, this can be an effective way to achieve transport confidentiality and authentication as well as to prevent unauthorized callers from invoking the remote API.
Message–Level Security: With the ever-increasing popularity of Web services, a new form of transit security is becoming mainstream. Web services bring in a unique requirement to transit security not present or relevant in the past: the need to provide cryptographic properties in multihop fashion. For instance, consider a message sent from the user to the e-commerce store. This message contains product IDs and quantity information as well as the user’s credit card details. The former only needs to be accessible to the online store, but the latter must only be decipherable to the credit card processing firm.
With existing transport security mechanisms such as SSL, this is not currently possible. That’s because SSL is a paired protocol operating end to end and is not meant for such multihop transactions. Some of the new WS-Security – related standards such as XML encryption, however, were designed to provide developers with just this ability. XML encryption and XML digital signatures allow different parts of an XML request to be encrypted under different keys (including keeping sections unencrypted as well). Thus, when an entity receives a message, it can decrypt only information intended for itself and cannot decipher any information intended for downstream or upstream — recipients – the credit card information in the example above. This form of transit security is often referred to as message-level security since it operates more at the message level than at the raw network or transport protocol level.
Managing Secrets in Memory
While data protection is traditionally associated with persistent storage and transport mechanisms, attackers are increasingly becoming more sophisticated, with attacks exploiting information disclosure through crash dump files and virtual memory page files. To deal with this threat, the .NET framework, for instance, supports the notion of protected memory. This block of memory is encrypted using the Data Protection API (DPAPI) and is only decrypted just before it is accessed. This minimizes the window within which the sensitive data is left unencrypted in memory and therefore exposed. In the .NET 2.0 framework, this functionality is accessible through two classes
known as ProtectedMemory10 and SecureString11. These classes let you securely maintain byte streams (for custom objects) and strings in a secure encrypted manner, as well as give the developer far more granular control over the garbage collector for these objects. One key caveat: If they are ever converted to their vanilla forms — a regular string for instance — all the benefits are lost since you will now have an insecure copy also being held in memory, which is not afforded the same protection as the SecureString type.
A Complex Beast
While as a science cryptography specifically and data protection generally is a complex beast that is easy to get wrong, developers can keep a few rules of thumb in mind and come out as winners. These include for instance, not creating their own cryptographic algorithm but using a tried-and-tested one or a cryptographic pseudorandom number generator when creating session or account identifiers. This article will help developers identify those rules of thumb that would enable them to build strong data protection into their applications.
Curphey is the founder of the Open Web Application Security Project (OWASP) and is director of software security consulting at Foundstone, now a McAfee company. OWASP was created to help organizations understand and improve the security of their Web applications. Curphey is the former director of software security for Charles Schwab.
Araujo is a principal software security consultant at Foundstone. He is responsible for creating and delivering the threat modeling and security code review service lines. He is also responsible for content creation and training delivery for Foundstone’s Building Secure Software, Writing Secure Code – ASP.NET and Writing Secure Code – C/C++ class.
1 These definitions are based on those available at http://en.wikipedia.org/wiki/CIA_triad
2 http://www.schneier.com/blog/archives/2005/06/more_
md5_collis.html
http://rsasecurity.com/rsalabs/node.asp?id=2738
http://www.rsasecurity.com/rsalabs/node.asp?id=2927
3 http://www.antsight.com/zsl/rainbowcrack/
4 http://www.interhack.net/people/cmcurtin/snake-oil-faq.html
5 http://world.std.com/~dtd/sign_encrypt/sign_encrypt7.html
6 http://en.wikipedia.org/wiki/Block_cipher_modes_of_operations
7 http://citeseer.ist.psu.edu/lenstra99selecting.html
8 Password-based key derivation functions: http://msdn2.microsoft.com/en-us/library/zb9zth5a
9 A digital certificate is simply the server’s public key signed and attested by a trusted certification authority.
10 http://msdn2.microsoft.com/library/fz5bt5h2(en-us,vs.80).aspx
11 http://msdn2.microsoft.com/en-us/library/7kt014s1