0
0
mirror of https://github.com/OpenVPN/openvpn3.git synced 2024-09-20 12:12:15 +02:00
openvpn3/openvpn/common/unicode.hpp
Charlie Vigue f38e97e1c3 Eliminate some conversion warnings
- [ipv4.hpp, ipv6.hpp] In both v4 and v6 headers it is safe to cast the hex
so as to eliminate the spurious warnings.

- [lz4.hpp] Apply value clamp to the hint that is sent to the compressor
to prevent a potential conversion overflow.

- [zlib.hpp] In compress_gzip, zs.s.avail_in and zs.s.avail_out are
theoretically susceptable to overflow. To prevent this we use
numeric_cast. In decompress_gzip we do a similar thing for zs.s.avail_in
but only value clamp avail_out, since the read loop looks like it will
compensate

- [buffmt.hpp] It's safe to cast the result of the arithmentically caused
promotion back down to char.

- [base64.hpp] In Base64 CTOR, changed type of a couple variables to
match the type of the table they generate. In decode, perform a static
cast to the type of the template elements the function is
instantiated for.

- [core.hpp] Perform static cast long --> int on value representing
number of cores. If we run on systems where there are more cores than
int can represent this will behave oddly, but this circumstance
seems unlikely at the present time.

- [environ.hpp] The casts seem to be safe but I have added a todo ticket
to evaluate this change further.

- [hexstr.hpp] In render_hex_char there were two conversion warnings
and a bug involving out of range input. Those are addressed.
In dump_hex the result of some math and logic is now clamped
to the range of acceptable input values for string::spaces
In parse_hex the result of converting from a hex string to an
integral value is cast to the template value_type

- [hostport.hpp] The static_cast should be safe because the value
produced by validate_port is range checked.

- [split.hpp] Applied numeric cast to ensure output of lex.get stays
within acceptable type limit.

- [stop.hpp] In Stop::Scope It's extremely unlikely but was possible for
the vector size to exceed the limit of int. The size now has a much lower
limit applied and will throw if it is exceeded.

- [string.hpp] Changed the call to toupper/tolower so they call the
locale function template instead of the cctype C function. This
eliminates the warning and the need for the cast.

- [cliproto.hpp] The computation of mss_fix is stored in a size_t and
then assigned to an unsigned short. We clamp this assignment
to the range of unsigned short.

- [tempfile.hpp] In TempFile CTOR suffixLen is computed as one type
and consumed as another. Since the CTOR is already throwing
for a couple other error conditions, I have added a
numeric_cast to the conversion that also throws in case of a
value overflow.

- [unicode.hpp] In an 8 --> 16 bit string conversion we mask and assign
in a way the compiler can't be certain is safe even though it is safe.
Added static cast to let the compiler know it's safe. In the second case
the class uses unsigned int to store a size, and then uses it in with size_t
which generates conversion warnings. I have changed the type of size
to size_t

- [logperiod.hpp] in log_period changed return type specification to
match the actual return type.

- [usergroup_retain_cap.hpp] In the unlikely event the caps size (size_t)
exceeds the range cap_set_flag can accept, an exception will be thrown.

- [crypto_aead.hpp] StaticKey::size provides a size_t where unsigned int
is required. We use numeric_cast to check the size() value in the
extremely unlikely event it is manipulated to exceed the allowed value.

- [packet_id.hpp] Code packs a time_t into a uint32_t for replay packet
ID protection purposes. The warning is supressed by a mask and cast
since the 32 bit limit is baked into the protocol and the overflow itself
does not cause a severe breakage.

- [headredact.hpp] Altered code such that the type that stores the find
result is compatible with the result from find. Additionally used the
npos constant instead of -1. There is a commented out code block that
claims to be dropped due to requiring C++ '14 - consider just using
that.

- [csum.hpp] in csum fold and cfold one has a mask and cast, the
second is just a cast to undo a promotion. Both appear safe.

- [ipv4.hpp] Values are masked and shifted so the cast should be safe.
Added cast.

- [ping4.hpp] ICMP ID and sequence number function arguments are
changed to the same type as needed by the structure. For
IPv4 header version_len 2nd arg is int but sizeof is not, so we
cast it. IPv4 tot_len is a uint16_t so we clamp to that value
range and compute it once.

- [ping6.hpp] Enforces a value constraint on the len argument to
csum_icmp, then checks the result of some math to ensure
the result will fit in the type it has to fit. In generate_echo_request
the ICMP ID and sequence args are changed to match the
type they are assigned to in the struct, and added
numeric_cast to range check payload_len.

 - [remotelist.hpp] In get_endpoint, endpoint.port is called with an
unsigned int where the function is expecting an unsigned short int.
Since parse_number_throw is a function template, we just ask it to
return the correct type now.

- [compress.hpp] In v2_push we accept an int value that is assigned to
an unsigned char we push to the buffer. I changed the function to
accept an unsigned char directly.

Added unit tests - thanks Mark Deric.

Signed-off-by: Charlie Vigue <charlie.vigue@openvpn.net>
2023-03-08 15:21:50 +00:00

307 lines
8.9 KiB
C++

// OpenVPN -- An application to securely tunnel IP networks
// over a single port, with support for SSL/TLS-based
// session authentication and key exchange,
// packet encryption, packet authentication, and
// packet compression.
//
// Copyright (C) 2012-2022 OpenVPN Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License Version 3
// as published by the Free Software Foundation.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program in the COPYING file.
// If not, see <http://www.gnu.org/licenses/>.
// General-purpose function for dealing with unicode.
#ifndef OPENVPN_COMMON_UNICODE_H
#define OPENVPN_COMMON_UNICODE_H
#include <string>
#include <cstring> // for std::memcpy
#include <algorithm> // for std::min
#include <memory>
#include <cctype>
#include <openvpn/common/size.hpp>
#include <openvpn/common/exception.hpp>
#include <openvpn/common/unicode-impl.hpp>
#include <openvpn/buffer/buffer.hpp>
namespace openvpn {
namespace Unicode {
OPENVPN_SIMPLE_EXCEPTION(unicode_src_overflow);
OPENVPN_SIMPLE_EXCEPTION(unicode_dest_overflow);
OPENVPN_SIMPLE_EXCEPTION(unicode_malformed);
// Return true if the given buffer is a valid UTF-8 string.
// Extra constraints:
enum
{
UTF8_NO_CTRL = (1 << 30), // no control chars allowed
UTF8_NO_SPACE = (1 << 31), // no space chars allowed
};
inline bool is_valid_utf8_uchar_buf(const unsigned char *source,
size_t size,
const size_t max_len_flags = 0) // OR max length (or 0 to disable) with UTF8_x flags above
{
const size_t max_len = max_len_flags & ((size_t)UTF8_NO_CTRL - 1); // NOTE -- use smallest flag value here
size_t unicode_len = 0;
while (size)
{
const unsigned char c = *source;
if (c == '\0')
return false;
const int length = trailingBytesForUTF8[c] + 1;
if ((size_t)length > size)
return false;
if (!isLegalUTF8(source, length))
return false;
if (length == 1)
{
if ((max_len_flags & UTF8_NO_CTRL) && std::iscntrl(c))
return false;
if ((max_len_flags & UTF8_NO_SPACE) && std::isspace(c))
return false;
}
source += length;
size -= length;
++unicode_len;
if (max_len && unicode_len > max_len)
return false;
}
return true;
}
template <typename STRING>
inline bool is_valid_utf8(const STRING &str, const size_t max_len_flags = 0)
{
return is_valid_utf8_uchar_buf((const unsigned char *)str.c_str(), str.length(), max_len_flags);
}
// Return the byte position in the string that corresponds with
// the given character index. Return values:
enum
{
UTF8_GOOD = 0, // succeeded, result in index
UTF8_BAD, // failed, string is not legal UTF8
UTF8_RANGE, // failed, index is beyond end of string
};
template <typename STRING>
inline int utf8_index(STRING &str, size_t &index)
{
const size_t size = str.length();
size_t upos = 0;
size_t pos = 0;
while (pos < size)
{
const int len = trailingBytesForUTF8[(unsigned char)str[pos]] + 1;
if (pos + len > size || !isLegalUTF8((const unsigned char *)&str[pos], len))
return UTF8_BAD;
if (upos >= index)
{
index = pos;
return UTF8_GOOD;
}
pos += len;
++upos;
}
return UTF8_RANGE;
}
// Truncate a UTF8 string if its length exceeds max_len
template <typename STRING>
inline void utf8_truncate(STRING &str, size_t max_len)
{
const int status = utf8_index(str, max_len);
if (status == UTF8_GOOD || status == UTF8_BAD)
str = str.substr(0, max_len);
}
// Return a printable UTF-8 string, where bad UTF-8 chars and
// control chars are mapped to '?'.
// If max_len_flags > 0, print a maximum of max_len_flags chars.
// If UTF8_PASS_FMT flag is set in max_len_flags, pass through \r\n\t
enum
{
UTF8_PASS_FMT = (1 << 31),
UTF8_FILTER = (1 << 30),
};
template <typename STRING>
inline STRING utf8_printable(const STRING &str, size_t max_len_flags)
{
STRING ret;
const size_t size = str.length();
const size_t max_len = max_len_flags & ((size_t)UTF8_FILTER - 1); // NOTE -- use smallest flag value here
size_t upos = 0;
size_t pos = 0;
ret.reserve(std::min(str.length(), max_len) + 3); // add 3 for "..."
while (pos < size)
{
if (!max_len || upos < max_len)
{
unsigned char c = str[pos];
int len = trailingBytesForUTF8[c] + 1;
if (pos + len <= size
&& c >= 0x20 && c != 0x7F
&& isLegalUTF8((const unsigned char *)&str[pos], len))
{
// non-control, legal UTF-8
ret.append(str, pos, len);
}
else
{
// control char or bad UTF-8 char
if (c == '\r' || c == '\n' || c == '\t')
{
if (!(max_len_flags & UTF8_PASS_FMT))
c = ' ';
}
else if (max_len_flags & UTF8_FILTER)
c = 0;
else
c = '?';
if (c)
ret += c;
len = 1;
}
pos += len;
++upos;
}
else
{
ret.append("...");
break;
}
}
return ret;
}
template <typename STRING>
inline size_t utf8_length(const STRING &str)
{
const size_t size = str.length();
size_t upos = 0;
size_t pos = 0;
while (pos < size)
{
int len = std::min((int)trailingBytesForUTF8[(unsigned char)str[pos]] + 1,
(int)size);
if (!isLegalUTF8((const unsigned char *)&str[pos], len))
len = 1;
pos += len;
++upos;
}
return upos;
}
inline void conversion_result_throw(const ConversionResult res)
{
switch (res)
{
case conversionOK:
return;
case sourceExhausted:
throw unicode_src_overflow();
case targetExhausted:
throw unicode_dest_overflow();
case sourceIllegal:
throw unicode_malformed();
}
}
// Convert a UTF-8 string to UTF-16 little endian (no null termination in return)
template <typename STRING>
inline BufferPtr string_to_utf16(const STRING &str)
{
std::unique_ptr<UTF16[]> utf16_dest(new UTF16[str.length()]);
const UTF8 *src = (UTF8 *)str.c_str();
UTF16 *dest = utf16_dest.get();
const ConversionResult res = ConvertUTF8toUTF16(&src,
src + str.length(),
&dest,
dest + str.length(),
lenientConversion);
conversion_result_throw(res);
BufferPtr ret(new BufferAllocated((dest - utf16_dest.get()) * 2, BufferAllocated::ARRAY));
UTF8 *d = ret->data();
for (const UTF16 *s = utf16_dest.get(); s < dest; ++s)
{
*d++ = static_cast<UTF8>(*s & 0xFF);
*d++ = static_cast<UTF8>((*s >> 8) & 0xFF);
}
return ret;
}
class UTF8Iterator
{
public:
struct Char
{
unsigned int len;
unsigned char data[4];
bool valid;
bool is_valid() const
{
return valid && len >= 1 && len <= sizeof(data);
}
std::string str(const char *malformed)
{
if (is_valid())
return std::string((char *)data, len);
else
return malformed;
}
};
UTF8Iterator(const std::string &str_arg)
: str((unsigned char *)str_arg.c_str()),
size(str_arg.length())
{
}
bool get(Char &c)
{
if (size)
{
unsigned int len = std::min((unsigned int)trailingBytesForUTF8[*str] + 1,
(unsigned int)size);
if (isLegalUTF8(str, len))
{
c.valid = true;
c.len = std::min(len, (unsigned int)sizeof(c.data));
std::memcpy(c.data, str, c.len);
}
else
{
c.valid = false;
c.len = 1;
}
str += c.len;
size -= c.len;
return true;
}
else
return false;
}
private:
const unsigned char *str;
size_t size;
};
} // namespace Unicode
} // namespace openvpn
#endif