org.hd.d.pg2k.svrCore
Class Name

java.lang.Object
  extended by org.hd.d.pg2k.svrCore.Name
All Implemented Interfaces:
java.io.ObjectInputValidation, java.io.Serializable, java.lang.CharSequence, java.lang.Comparable<Name>, MemoryTools.Internable, TextUtils.CharSequence8Bit
Direct Known Subclasses:
Name.ExhibitFull

public class Name
extends java.lang.Object
implements TextUtils.CharSequence8Bit, java.lang.Comparable<Name>, java.io.Serializable, java.io.ObjectInputValidation, MemoryTools.Internable

Compact immutable in-memory representation of a (usually-short, sharable-prefix/suffix) 8-bit ASCII String. This can share a common prefix and suffix with another instance, avoiding having to store the shared elements itself.

This is particularly suitable for densely-packed sorted lists of values, such as sets of exhibit names which with tend to have by necessity long shared prefixes.

This is suitable for names but also other usually-short 8-bit text values, in particular those that don't have the sort of internal structure and repetition that Compact7BitString can take advantage of.

This supports random access reasonably efficiently (eg through the CharSequence interface) but some elements will be slower to access than others. (chars at either end will tend to be the slowest to access.)

Converting to and from this form should be relatively resource-light, eg with low CPU and memory requirements, though all instances are automatically intern()ed which has some cost.

Where there is a conflict between memory consumption and CPU, this implementation is biased to minimise memory use at the possible expense of CPU.

Internally this uses a byte[] to store the data (a basic halving of in-memory size in most JVMs), and use of a common prefix with a similar (nominally lexically previous close value) should help further trim memory use.

(This implementation internally attempts to minimise the number of objects kept live and that need to be traversed to complete construction.)

These classes are not publicly constructible in order to help control instance count, and indeed may use MemoryTools.intern() to eliminate duplicates.

This uses an internal cache to attempt to find ad-hoc prefix matches with existing instances if no 'prev' value is offered during creation of a new instance. This cache is memory-sensitive and will be silently sacrificed if the system is short of memory.

Note that MemoryTools.intern() may not release a 'duplicate' if there remain other instances hanging from it.

This is not final though all its constructors are private to allow only inner-class derived types in order to preserve immutability.

As this is performance-critical code, and may have to run on slow/non-JIT JVMs, most assert()s that check purely-internally-imposed invariants are entirely eliminated from release code; those that might still be triggered by runtime issues external to this class, such as run-time presence of hashing algorithms, remain present to enabled at runtime if required.

See Also:
Serialized Form

Nested Class Summary
private static class Name.AdHocPrefixCache<TerminusKey extends java.lang.Number>
          Contains lookup cache by prefix for all Name instances.
private static class Name.CSWrapper<T extends Name>
          Effectively-immutable light-weight wrapper around the underlying text.
private static class Name.EMPTIES
          Used to hold values that may be needed before Name's own static initialisation can have completed.
static class Name.ExhibitFull
          ExhibitFullName implementation on top of SharedTermini8BitString.
static class Name.ExhibitShort
          ExhibitShortName immutable CharSequence light-weight wrapper around underlying ExhibitFullName data.
 
Field Summary
private static java.lang.ref.SoftReference<Name.AdHocPrefixCache<java.lang.Integer>> _adHocPrefixCache
          Memory-sensitive cache of Name instances indexed by fixed-length prefix and suffix; initially null, not thereafter.
private  java.lang.ref.SoftReference<byte[]> _cached_termini
          Cached termini, catenated to minimise memory overhead; never null.
private  short _prefixLen
          This is a shadow of the most-frequently accessed prefix length value; non-negative.
private static boolean ALWAYS_LOOK_FOR_EARLIER_PREV
          If true, always try to search back through prev values for an as-good or better match.
static java.util.Comparator<Name> CASE_INSENSITIVE_ORDER
          Case-insensitive Comparator for Name equivalent to TextUtils.CASE_INSENSITIVE_ORDER; not null.
private static boolean DEBUG
          Set true for extra debug/development logging output.
static Name EMPTY
          Shared empty instance; not null.
private static boolean EXTRA_ASSERTS
          Set true for extra assert()s for internal invariants, usually only in non-release code.
private  int hash
          Cache of the computed hash code.
private static int MAX_PREFIX_LENGTH
          Max shared prefix length in characters; strictly positive.
private static int MAX_PREV_CHAIN_LENGTH
          Maximum 'prev' chain length that we will support; non-negative.
private static int MAX_SUFFIX_LENGTH
          Max shared prefix length in characters; strictly positive.
private static int MAX_TRIVIAL_COPY
          Maximum number of chars to take a copy of the data.
private  byte[] mid
          Middle portion of value, or a constant empty value iff there are no middle bytes; never null.
private static int MIN_SHARE_CHARS
          Minimum prefix/suffix total share length to use a prev; strictly positive.
private static int MIN_TERMINI_CACHE_CHARS
          Minimum length of name that we'll attempt to cache termini for; non-negative.
private static int MIN_TERMINUS_MATCH_LENGTH
          Minimum terminus match length; strictly positive.
private static int PREFIX_BITS
          Number of bits for prefix length (unsigned) value.
private static int PREFIX_MASK
          Mask for prefix bits (at bottom of word).
protected  Name prev
          Non-empty item with which we share a prefix and/or suffix, or null if entire value is empty (equivalent to "") or if there is no shared part(s).
private static long serialVersionUID
          Unique serialisation ID.
private static int SUFFIX_BITS
          Number of bits for suffix length (unsigned) value.
private static int SUFFIX_MASK
          Mask for suffix bits (at top of word).
private  short terminiLengths
          Termini lengths (unsigned) packed into a single value for efficiency.
 
Constructor Summary
private Name(java.lang.CharSequence text, Name putativePrev)
          Create a new instance with a previous value to attempt to share a prefix with.
 
Method Summary
private  byte[] cacheTermini()
          Make cached termini and return the cached value; never null.
 char charAt(int index)
           
private  byte charAtInternal(int index)
          Internal version of charAt() with bounds checking assumed done, etc.
 int compareTo(java.lang.CharSequence o)
          Provide natural sort order for this and derived classes.
 int compareTo(Name o)
          Provide natural sort order for this and derived classes.
private static int computeCommonPrefixLength(java.lang.CharSequence text, Name p)
          Compute the shared/common prefix length.
private static int computeCommonSuffixLength(java.lang.CharSequence text, java.lang.CharSequence p)
          Compute the shared/common suffix length.
private static java.lang.Integer computeIntegerPrefixKey(java.lang.CharSequence cs)
          Compute the Integer prefix key for the supplied CharSequence; never null.
private static java.lang.Integer computeIntegerSuffixKey(java.lang.CharSequence cs)
          Compute the Integer suffix key for the supplied CharSequence; never null.
 boolean contentEquals(java.lang.CharSequence o)
          True iff this represents the same sequence of char values as the specified sequence.
static Name create(java.lang.CharSequence text)
          Create a Name instance; never null unless input is null.
static Name create(java.lang.CharSequence text, Name prev)
          Create a Name instance with a previous value to attempt to share a prefix with; never null unless input is null.
static java.lang.CharSequence createOrStringFallback(java.lang.CharSequence text, Name prev)
          Attempts to create a Name instance, but falls back to returning an intern()ed String instance otherwise.
 boolean equals(java.lang.Object obj)
          Equality depends on the items being the same concrete (derived) type and lexically identical.
private static Name.AdHocPrefixCache<java.lang.Integer> getAdHocPrefixCache()
          Get instance of AdHocPrefixCache; never null.
 int getPrefixLen()
          Length of common prefix with prev item if any; non-negative.
 int getPrevChainLength()
          Chain depth, ie how deep our chain of prev values is, zero if this instance has no prev; non-negative.
private  int getSuffixLen()
          Length of common suffix with prev item if any; non-negative.
 int hashCode()
          The hash is computed over the entire text as for String.
 int length()
           
private  void readObject(java.io.ObjectInputStream in)
          Deserialise.
protected  java.lang.Object readResolve()
          Deserialise: validate this instance, coerce mildly-errant values, and eliminate duplicates.
 java.lang.CharSequence subSequence(int start, int end)
          Generate a light-weight wrapper, or for very small amounts of data, a copy; never null.
 byte[] toByteArray()
          Create private byte[] copy for caller; never null.
 java.lang.String toString()
          Convert to String containing same char sequence.
 void validateObject()
          Validate fields/state.
private  void writeObject(java.io.ObjectOutputStream oos)
          Write out a minimally-redundant form of our internal information.
 void writeToByteArray(byte[] buf, int offset)
          Writes the name as ASCII bytes to the given (non-null) byte array starting at the given offset.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

EMPTY

public static final Name EMPTY
Shared empty instance; not null.


MAX_PREV_CHAIN_LENGTH

private static final int MAX_PREV_CHAIN_LENGTH
Maximum 'prev' chain length that we will support; non-negative. Limits the amount of recursion and CPU involved in processing 'nested'/'chained' prefixes, at the possible expense of some lost compression (ie memory savings).

This limit also helps avoid blocking GC through overly tangled and straggly data structures.

Usually positive and in the range (say) 1 to 32 to allow compression, though higher values also hurt things such as sorting which forces lots of chain-chasing while comparing prefixes.

As at 2009/08/02 the maximum actual value when creating fresh ExhibitFull values is 18, ie 18 is sufficient to yield maximum compression/sharing.

We may need to raise or lower this limit even though serialised data already exists, so we gently coerce deserialised Name instances to cap this value where necessary.

See Also:
Constant Field Values

ALWAYS_LOOK_FOR_EARLIER_PREV

private static final boolean ALWAYS_LOOK_FOR_EARLIER_PREV
If true, always try to search back through prev values for an as-good or better match. This in principle allows less redundant intermediate values to be retained in memory and postpones any forced truncation of the chain.

See Also:
Constant Field Values

PREFIX_BITS

private static final int PREFIX_BITS
Number of bits for prefix length (unsigned) value. Should usually be greater than the number of suffix bits, since the primary compaction mechanism is prefix sharing.

9 is needed for maximal compression on 20090707 test set, but compression at 8 is very close.

See Also:
Constant Field Values

PREFIX_MASK

private static final int PREFIX_MASK
Mask for prefix bits (at bottom of word). Broken out to help pre-JIT execution performance.

See Also:
Constant Field Values

SUFFIX_BITS

private static final int SUFFIX_BITS
Number of bits for suffix length (unsigned) value. This uses whatever bits are left after prefix bits have been taken eg from 16 bits in a shared short.

See Also:
Constant Field Values

SUFFIX_MASK

private static final int SUFFIX_MASK
Mask for suffix bits (at top of word). Broken out to help pre-JIT execution performance.

See Also:
Constant Field Values

MAX_PREFIX_LENGTH

private static final int MAX_PREFIX_LENGTH
Max shared prefix length in characters; strictly positive. An actual match greater than this will be capped to this.

See Also:
Constant Field Values

MAX_SUFFIX_LENGTH

private static final int MAX_SUFFIX_LENGTH
Max shared prefix length in characters; strictly positive. An actual match greater than this will be capped to this.

See Also:
Constant Field Values

MIN_TERMINUS_MATCH_LENGTH

private static final int MIN_TERMINUS_MATCH_LENGTH
Minimum terminus match length; strictly positive. This helps eliminate completely spurious linkages that may hurt GC and trail rubbish in the serialised form of such spuriously-linked values.

We won't share very short prefixes/suffixes.

We avoid matches of "" (that should be dealt with by intern()) when this is >0. We should probably avoid picking up single letters such as trailing full-stops when this is >1. We should probably allow everything of at least length 3 or 4 (such as ".jpg" or ".au" for example) which is probably roughly the difference in bytes in a serialised stream of adaptively encoding a null vs a non-null prev pointer.

See Also:
Constant Field Values

MIN_SHARE_CHARS

private static final int MIN_SHARE_CHARS
Minimum prefix/suffix total share length to use a prev; strictly positive. If 1 then we always try to use a 'prev' when there is even one character matching, but this may entrain lots of 'dead wood' that might otherwise be garbage.

A higher value may ultimately result in lower memory consumption (and no loss of compactness with or without compression of the serialised form) though with some loss in the headline immediate 'characters saved' figure.

This does not need to be enforced on (old) deserialised data, since for example intern() may often resolve such issues silently, and in any case it is not a correctness issue.

For consistency we line this up with the minimum prefix/suffix match length.

See Also:
Constant Field Values

hash

private transient int hash
Cache of the computed hash code. Not part of the permanent/serialised state of the object since easy to recompute, but this is called so often by intern(), etc, that it is worth storing given that typical texts will be long.


prev

protected final Name prev
Non-empty item with which we share a prefix and/or suffix, or null if entire value is empty (equivalent to "") or if there is no shared part(s). May be non-empty if this new value is a prefix/suffix of the prev value.

Is protected since potentially to derived instances.


terminiLengths

private final short terminiLengths
Termini lengths (unsigned) packed into a single value for efficiency. The prefix bits are the low-order bits.

This is part of the compact serialised state.


_prefixLen

private transient short _prefixLen
This is a shadow of the most-frequently accessed prefix length value; non-negative. This contains a copy of the prefix length.

Should not expand the memory footprint of an instance because it should slot in the gap left by the terminiLengths short value.

This is not part of the serialised state.


mid

private transient byte[] mid
Middle portion of value, or a constant empty value iff there are no middle bytes; never null. We don't use default serialisation for this: it is written on the wire as raw bytes (if non-empty, else not at all) following a one-byte (positive) length if <= 127 chars, else a 4-byte (negated) length to be able to represent maximal-size CharSequences.


_cached_termini

private transient volatile java.lang.ref.SoftReference<byte[]> _cached_termini
Cached termini, catenated to minimise memory overhead; never null. Held via a SoftReference to allow automatic release if memory is stressed.

Marked transient since redundant and not part of the persisted state.

Marked volatile for lock-free thread-safe access.

When empty we can set it to the constant/empty shared value which takes up no extra space but saves a null test for this value.


MIN_TERMINI_CACHE_CHARS

private static final int MIN_TERMINI_CACHE_CHARS
Minimum length of name that we'll attempt to cache termini for; non-negative. Avoids silly memory overheads for very small instances.

See Also:
Constant Field Values

MAX_TRIVIAL_COPY

private static final int MAX_TRIVIAL_COPY
Maximum number of chars to take a copy of the data.

See Also:
Constant Field Values

serialVersionUID

private static final long serialVersionUID
Unique serialisation ID.

See Also:
Constant Field Values

_adHocPrefixCache

private static java.lang.ref.SoftReference<Name.AdHocPrefixCache<java.lang.Integer>> _adHocPrefixCache
Memory-sensitive cache of Name instances indexed by fixed-length prefix and suffix; initially null, not thereafter. Borrows a few techniques from GZIP-like algorithms.

This uses Integer as a key for up to 4 bytes of prefix/suffix lookup.

Only accessed under the class lock, by getAdHocPrefixCache().

Held via a StaticReference to allow the entire cache to be discarded automatically under extreme memory stress.

Accessed under the class lock.


CASE_INSENSITIVE_ORDER

public static final java.util.Comparator<Name> CASE_INSENSITIVE_ORDER
Case-insensitive Comparator for Name equivalent to TextUtils.CASE_INSENSITIVE_ORDER; not null. May be able to operate more efficiently than generic TextUtils.CASE_INSENSITIVE_ORDER.


EXTRA_ASSERTS

private static final boolean EXTRA_ASSERTS
Set true for extra assert()s for internal invariants, usually only in non-release code.

See Also:
Constant Field Values

DEBUG

private static final boolean DEBUG
Set true for extra debug/development logging output.

See Also:
Constant Field Values
Constructor Detail

Name

private Name(java.lang.CharSequence text,
             Name putativePrev)
Create a new instance with a previous value to attempt to share a prefix with. Note that the 'prev' value may be any sub-type of Name.

Private in order to better manage instance control.

Parameters:
text - the 8-bit text to store and must not equal the prev; never null but can be empty
putativePrev - non-empty previous value with which to attempt to share a prefix; null if none
Method Detail

create

public static Name create(java.lang.CharSequence text)
Create a Name instance; never null unless input is null. It is often best to attempt to share a prefix with the lexically previous (or next) item.

If passed a null this returns a null.

If passed a Name (but not including sub-classes) this returns it untouched.

Parameters:
text - the 8-bit text to store; if null then null is returned

create

public static Name create(java.lang.CharSequence text,
                          Name prev)
Create a Name instance with a previous value to attempt to share a prefix with; never null unless input is null. It is often best to attempt to share a prefix with the lexically previous (or next) item.

If passed a null this returns a null.

If passed a Name (but not including sub-classes) this returns it untouched.

Parameters:
text - the 8-bit text to store; if null then null is returned
prev - previous value with which to attempt to share a prefix; null if none

createOrStringFallback

public static java.lang.CharSequence createOrStringFallback(java.lang.CharSequence text,
                                                            Name prev)
Attempts to create a Name instance, but falls back to returning an intern()ed String instance otherwise. This copes will possible non-8-bit values by returning a String rather than a Name.

Essentially this returns the most memory-efficient representation reasonably possible.

Returns:
null if text is null, else Name if 8-bit text, else intern()ed String instance

computeCommonPrefixLength

private static int computeCommonPrefixLength(java.lang.CharSequence text,
                                             Name p)
Compute the shared/common prefix length.


computeCommonSuffixLength

private static int computeCommonSuffixLength(java.lang.CharSequence text,
                                             java.lang.CharSequence p)
Compute the shared/common suffix length. Note that we may have to match against some tail of the prev text (to avoid overlapping prefix/suffix parts) and thus the p parameter is the more general CharSequence.


hashCode

public final int hashCode()
The hash is computed over the entire text as for String. This does not depend on how the data is internally stored, eg using a prefix or not.

Overrides:
hashCode in class java.lang.Object

equals

public final boolean equals(java.lang.Object obj)
Equality depends on the items being the same concrete (derived) type and lexically identical. This does not depend on how the data is internally stored, eg using a prefix or not. Specifically designed to work for derived classes too by default.

Overrides:
equals in class java.lang.Object

contentEquals

public final boolean contentEquals(java.lang.CharSequence o)
True iff this represents the same sequence of char values as the specified sequence.

Parameters:
other - the sequence to compare this String against
Returns:
true if this represents the same sequence of char values as the specified sequence, false otherwise

cacheTermini

private byte[] cacheTermini()
Make cached termini and return the cached value; never null. Computes the cached form of the termini and stores it wrapped in a SoftReference, discarding anything else already present.

This has the form of all the chars of the shared prefix immediately followed by all the chars of the shared suffix, stored as byte values with no padding.


getPrevChainLength

public final int getPrevChainLength()
Chain depth, ie how deep our chain of prev values is, zero if this instance has no prev; non-negative. Can be used to help avoid getting silly tangles of non-GCable prev chains, and indeed might help avoid situations that might attempt to create circular references.

Not for normal use of this object!


getPrefixLen

public final int getPrefixLen()
Length of common prefix with prev item if any; non-negative. Only of minimal interest outside the class and may be withdrawn from the API.


getSuffixLen

private final int getSuffixLen()
Length of common suffix with prev item if any; non-negative. No one really needs to see this outside the class...

Being private may encourage a static optimiser such as ProGuard to inline it.


toString

public java.lang.String toString()
Convert to String containing same char sequence. We do not expensively intern() the result as it may be short-lived..

Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class java.lang.Object

writeToByteArray

public final void writeToByteArray(byte[] buf,
                                   int offset)
Writes the name as ASCII bytes to the given (non-null) byte array starting at the given offset. The array must be long enough (from the given offset) else an exception will be thrown.

Aims to be efficient (in CPU and memory footprint).


toByteArray

public final byte[] toByteArray()
Create private byte[] copy for caller; never null. Aims to be efficient (in CPU and memory footprint).

Specified by:
toByteArray in interface TextUtils.CharSequence8Bit

charAtInternal

private byte charAtInternal(int index)
Internal version of charAt() with bounds checking assumed done, etc. This has no further cacheing...


charAt

public final char charAt(int index)
Specified by:
charAt in interface java.lang.CharSequence

length

public final int length()
Specified by:
length in interface java.lang.CharSequence

subSequence

public final java.lang.CharSequence subSequence(int start,
                                                int end)
Generate a light-weight wrapper, or for very small amounts of data, a copy; never null. Not necessarily Serializable.

Specified by:
subSequence in interface java.lang.CharSequence

validateObject

public void validateObject()
                    throws java.io.InvalidObjectException
Validate fields/state. Called in the constructor and possibly after deserialising.

Barf if something bad is found.

Specified by:
validateObject in interface java.io.ObjectInputValidation
Throws:
java.io.InvalidObjectException

readObject

private void readObject(java.io.ObjectInputStream in)
                 throws java.io.IOException,
                        java.lang.ClassNotFoundException
Deserialise. Expects validation to happen in readResolve().

Throws:
java.io.IOException
java.lang.ClassNotFoundException

writeObject

private void writeObject(java.io.ObjectOutputStream oos)
                  throws java.io.IOException
Write out a minimally-redundant form of our internal information. The more-efficient on-the-wire format also makes defensive reading easier.

Throws:
java.io.IOException

readResolve

protected java.lang.Object readResolve()
                                throws java.io.ObjectStreamException
Deserialise: validate this instance, coerce mildly-errant values, and eliminate duplicates. Has to handle derived classes too.

Throws:
java.io.ObjectStreamException

compareTo

public final int compareTo(java.lang.CharSequence o)
Provide natural sort order for this and derived classes.


compareTo

public final int compareTo(Name o)
Provide natural sort order for this and derived classes.

Specified by:
compareTo in interface java.lang.Comparable<Name>

computeIntegerPrefixKey

private static java.lang.Integer computeIntegerPrefixKey(java.lang.CharSequence cs)
Compute the Integer prefix key for the supplied CharSequence; never null. This assumes that the top 8 bits of each char are unimportant since if the sequence is to be turned into a Name they must be zero.

We construct the prefix key from the first four characters of the putative Name (or fewer if the name is shorter).


computeIntegerSuffixKey

private static java.lang.Integer computeIntegerSuffixKey(java.lang.CharSequence cs)
Compute the Integer suffix key for the supplied CharSequence; never null. This assumes that the top 8 bits of each char are unimportant since if the sequence is to be turned into a Name they must be zero.

We construct the suffix key from the last four characters of the putative Name (or fewer if the name is shorter).


getAdHocPrefixCache

private static final Name.AdHocPrefixCache<java.lang.Integer> getAdHocPrefixCache()
Get instance of AdHocPrefixCache; never null.


DHD Multimedia Gallery V1.57.21

Copyright (c) 1996-2011, Damon Hart-Davis. All rights reserved.