Struct bstr::Utf8Error [−][src]
pub struct Utf8Error { /* fields omitted */ }
An error that occurs when UTF-8 decoding fails.
This error occurs when attempting to convert a non-UTF-8 byte
string to a Rust string that must be valid UTF-8. For example,
to_str
is one such method.
Example
This example shows what happens when a given byte sequence is invalid, but ends with a sequence that is a possible prefix of valid UTF-8.
use bstr::{B, ByteSlice}; let s = B(b"foobar\xF1\x80\x80"); let err = s.to_str().unwrap_err(); assert_eq!(err.valid_up_to(), 6); assert_eq!(err.error_len(), None);
This example shows what happens when a given byte sequence contains invalid UTF-8.
use bstr::ByteSlice; let s = b"foobar\xF1\x80\x80quux"; let err = s.to_str().unwrap_err(); assert_eq!(err.valid_up_to(), 6); // The error length reports the maximum number of bytes that correspond to // a valid prefix of a UTF-8 encoded codepoint. assert_eq!(err.error_len(), Some(3)); // In contrast to the above which contains a single invalid prefix, // consider the case of multiple individal bytes that are never valid // prefixes. Note how the value of error_len changes! let s = b"foobar\xFF\xFFquux"; let err = s.to_str().unwrap_err(); assert_eq!(err.valid_up_to(), 6); assert_eq!(err.error_len(), Some(1)); // The fact that it's an invalid prefix does not change error_len even // when it immediately precedes the end of the string. let s = b"foobar\xFF"; let err = s.to_str().unwrap_err(); assert_eq!(err.valid_up_to(), 6); assert_eq!(err.error_len(), Some(1));
Implementations
impl Utf8Error
[src]
impl Utf8Error
[src]pub fn valid_up_to(&self) -> usize
[src]
Returns the byte index of the position immediately following the last valid UTF-8 byte.
Example
This examples shows how valid_up_to
can be used to retrieve a
possibly empty prefix that is guaranteed to be valid UTF-8:
use bstr::ByteSlice; let s = b"foobar\xF1\x80\x80quux"; let err = s.to_str().unwrap_err(); // This is guaranteed to never panic. let string = s[..err.valid_up_to()].to_str().unwrap(); assert_eq!(string, "foobar");
pub fn error_len(&self) -> Option<usize>
[src]
Returns the total number of invalid UTF-8 bytes immediately following
the position returned by valid_up_to
. This value is always at least
1
, but can be up to 3
if bytes form a valid prefix of some UTF-8
encoded codepoint.
If the end of the original input was found before a valid UTF-8 encoded
codepoint could be completed, then this returns None
. This is useful
when processing streams, where a None
value signals that more input
might be needed.