org.hd.d.pg2k.svrCore
Class ExhibitPropsComputableMutable

java.lang.Object
  extended by org.hd.d.pg2k.svrCore.ExhibitPropsComputableMutable
All Implemented Interfaces:
java.io.ObjectInputValidation, java.io.Serializable

public final class ExhibitPropsComputableMutable
extends java.lang.Object
implements java.io.Serializable, java.io.ObjectInputValidation

Immutable (and serialisable) store of all ephemeral computable auxiliary properties of a single exhibit. These are properties that are computed from exhibit and non-exhibit data and that need to be periodically recomputed, such as popularity score.

This is is called "mutable" because although any one instance is immutable, successive values retrieved for the same exhibit may be different.

This data has limited lifetime, but may be somewhat useful even when stale, so can be persisted if desired.

This is designed to be efficient and robust on the wire and in memory, since these details will be held for each and every exhibit.

See Also:
Serialized Form

Nested Class Summary
static class ExhibitPropsComputableMutable.Factor
          Single (immutable) component of multi-factor value.
 
Field Summary
private static int _MIN_VLONG_SAMPLE_PERIODS
          Minimum number of VLONG samples to smooth out daily/weekly cycles in data.
private static ExhibitPropsComputableMutable.Factor _NOT_NEW
          Contribution for goodness of non-new exhibits; zero goodness but the same "newness" weight.
private static long _SAMPLE_CYCLE_PERIOD_MAJOR_MS
          The longest cycle time we will look for in historical data, in ms; strictly positive.
private static long _SAMPLE_CYCLE_PERIOD_MINOR_MS
          Minimum sample period to even out short cycles in Web access patterns, in ms; strictly positive.
private static int _SEASONAL_VLONG_CYCLE_PERIODS
          Number of VLONG samples corresponding to annual cycles in data.
private static ExhibitPropsComputableMutable.Factor FACTOR_HASDESC
          Weighting factor for an item with a specific description.
private static ExhibitPropsComputableMutable.Factor FACTOR_HASGENERICDESC
          Weighting factor for an item with a generic description.
static ExhibitPropsComputableMutable.Factor FACTOR_ZERO
          Zero factor; zero weight, zero goodness.
private static boolean FORCE_CORR_RECOMP
          If true then force correlation data up to date.
static float GOODBAD_LIMIT
          Limit of goodness*correlation to consider something (eg category) significantly good or bad.
static int GOODBAD_LIMIT_INT
          Limit of goodness*correlation to consider something (eg category) significantly good or bad, is int value.
private  int goodness
          The "goodness" score, -MAX_VALUE is maximally bad and MAX_VALUE maximally good, zero is neutral.
static int MAX_AGE_BEFORE_STALE_MS
          Maximum age (in ms) of instances before considering them stale and recomputing them.
static float POP_AI_SCORER
          Maximum weighting from Scorer judgement of exhibit content.
static float POP_CORR_HASDESC
          Weighting/correlation of has-description component of popularity.
static float POP_CORR_NEWNESS
          Weighting/correlation of "newness" component of popularity.
static float POP_CORR_RANDOM
          Weighting/correlation of "random" component of popularity.
static float POP_RECENT_ACCESS
          Maximum weighting for exhibit recently viewed or downloaded.
static float POP_VOTE_CORR
          Maximum weighting from correlated voting on related exhibits.
static float POP_VOTES
          Maximum weighting for vox pop user votes.
private static long serialVersionUID
          Our serial version...
private  long staleAfter
          Time until which this data is considered valid; beyond this time is stale.
static ExhibitPropsComputableMutable TRIVIAL_NEUTRAL
          Trivially-stale entirely-neutral instance; never null.
private static boolean USE_ALL_BUCKET
          If true then include the "all" buckets in our calculations.
private static boolean USE_ALL_VOTES
          If true, use ALL available votes rather than a sampling in _computeSampleBitSet().
 
Constructor Summary
private ExhibitPropsComputableMutable(ExhibitStaticAttr esa, GenProps gp, AllExhibitProperties aep, BasicVarMgrInterface vars, ExhibitPropsComputableMutableVoteCacheIF voteCache, ScorerCacheIF scorers)
          Create a fully populated ExhibitPropsComputableMutable object.
private ExhibitPropsComputableMutable(long staleAfter, int goodness)
          Create an instance with the specified goodness and stale 'best before' time.
 
Method Summary
private static void _chooseRandomSlot(java.util.Random rnd, java.util.BitSet whichIntervals, boolean forceNew)
          Set a random bit in the BitSet to take a random sample.
private static ExhibitPropsComputableMutable.Factor _computeNewnessBonusFactor(ExhibitStaticAttr esa)
          Compute newness bonus (if any); never null.
private static ExhibitPropsComputableMutable.Factor _computeRandomGoodnessFactor()
          Compute random "goodness" factor symmetrical about zero; never null.
private static java.util.BitSet _computeSampleBitSet(long minNearMs)
          Compute BitSet of samples of event history to take; never null nor empty.
private static ExhibitPropsComputableMutable.Factor _computeTotalGoodnessFactor(ExhibitStaticAttr esa, GenProps gp, AllExhibitProperties aep, java.util.List<ExhibitPropsComputableMutable.Factor> initialComponents)
          Compute full goodness factor, using full data source if available; never null.
 long bestBefore()
          Get the basic 'best-before' time.
static java.util.Collection<ExhibitPropsComputableMutable.Factor> calcAccessFactors(java.lang.String exhibitName, AllExhibitProperties aep, BasicVarMgrInterface vars)
          Compute Factor(s) that depend on access data.
private static java.util.List<ExhibitPropsComputableMutable.Factor> calcCorrelationFactors(ExhibitStaticAttr esa, AllExhibitProperties aep, BasicVarMgrInterface vars, ExhibitPropsComputableMutableVoteCacheIF voteCache)
          Compute factors based on correlations.
static ExhibitPropsComputableMutable.Factor calcVoteFactor(java.lang.String exhibitName, AllExhibitProperties aep, BasicVarMgrInterface vars, long minNearMs)
          Compute exactly one Factor that depends on explicit user votes for specified exhibit; never null.
static ExhibitPropsComputableMutable compute(ExhibitStaticAttr esa, GenProps gp, AllExhibitProperties aep, BasicVarMgrInterface vars, ExhibitPropsComputableMutableVoteCacheIF voteCache, ScorerCacheIF scorers)
          Compute (accurate) value for a specified exhibit; never null.
static ExhibitPropsComputableMutable generateFastApproximation(ExhibitStaticAttr esa, GenProps gp)
          Compute a "quick"/approximate value for a specified exhibit; never null.
 int getGoodness()
          Get the "goodness" score, -MAX_VALUE is maximally bad and MAX_VALUE maximally good, zero is neutral.
 float getGoodnessAsFloat()
          Get the "goodness" score as a normalised float in the range -1 (bad) to +1 (good), 0 is neutral.
 java.lang.Boolean isGood()
          Find out if this exhibit is rated "good"/popular or not.
 boolean isStale()
          If true, the data in this object is stale.
 boolean isTriviallyStale()
          If true, the data in this object is not only stale; it is only even a fast approximation.
protected  java.lang.Object readResolve()
          Deserialise: use constructor for validation, defensive copying, etc.
 void validateObject()
          Validate fields/state.
protected  java.lang.Object writeReplace()
          Serialise: replace all trivially-stale values being serialised with TRIVIAL_NEUTRAL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GOODBAD_LIMIT

public static final float GOODBAD_LIMIT
Limit of goodness*correlation to consider something (eg category) significantly good or bad.

See Also:
Constant Field Values

GOODBAD_LIMIT_INT

public static final int GOODBAD_LIMIT_INT
Limit of goodness*correlation to consider something (eg category) significantly good or bad, is int value.


TRIVIAL_NEUTRAL

public static final ExhibitPropsComputableMutable TRIVIAL_NEUTRAL
Trivially-stale entirely-neutral instance; never null.


FORCE_CORR_RECOMP

private static final boolean FORCE_CORR_RECOMP
If true then force correlation data up to date. If false then accept somewhat stale or missing data, since this only changes slowly and a complete absence is neutral.

See Also:
Constant Field Values

staleAfter

private final long staleAfter
Time until which this data is considered valid; beyond this time is stale. If zero, or before the current time of day, the data is stale.

If the stale date is too far in the future for us to believe, ie much more than would be allowed with this version of the object, then we treat the data as stale in case it is from a version of this object with very different staleness limits or there was a persistence error.


MAX_AGE_BEFORE_STALE_MS

public static final int MAX_AGE_BEFORE_STALE_MS
Maximum age (in ms) of instances before considering them stale and recomputing them. Slightly (much less than a factor of two) different limit value between class/JVM/mirror instances to spread out recomputations.

Each instance of this object should pick a random time between about half of this value and the full value before going stale in order to spread out refresh computations over a reasonable interval. Recomputing these values may prove quite expensive.

This value is selected so that most exhibits actual scores will not have changed significantly in the given interval; something of the order of tens of hours to tens of days is good.


goodness

private final int goodness
The "goodness" score, -MAX_VALUE is maximally bad and MAX_VALUE maximally good, zero is neutral. This is a composite of many factors, including a small random element.

Has enough significant digits to allow a total ordering even over a very large number of exhibits.

This is nominally logically thought of in the range [-1.0, +1.0].


POP_VOTES

public static final float POP_VOTES
Maximum weighting for vox pop user votes. Should be large but possibly not overwhelming, eg because of pranks to attempt to "fix" the votes.

The vote element can count for or against an exhibit (ie be +ve or -ve), and an absence of any votes may tend to dilute goodness back towards neutral.

See Also:
Constant Field Values

POP_VOTE_CORR

public static final float POP_VOTE_CORR
Maximum weighting from correlated voting on related exhibits. Possibly quite a crude measure, though will be more valuable with more data available, and so if all our required samples are in place, then we'll give it a rating that should be visible to users, and more significant than just newness, for example.

See Also:
Constant Field Values

POP_AI_SCORER

public static final float POP_AI_SCORER
Maximum weighting from Scorer judgement of exhibit content. This AI-based mechanism inspects the content of each exhibit, and is driven to get the best possible fit with voting data.

We give this equal weighting with the metadata-driven statistical methods.

See Also:
Constant Field Values

POP_RECENT_ACCESS

public static final float POP_RECENT_ACCESS
Maximum weighting for exhibit recently viewed or downloaded. Possibly quite a crude measure, though will be more valuable with more data available, and so if all our required samples are in place, we'll give it a rating that should be visible to users, and more significant than just newness, for example.

We make this a little less important than vote correlations.

See Also:
Constant Field Values

POP_CORR_NEWNESS

public static final float POP_CORR_NEWNESS
Weighting/correlation of "newness" component of popularity. This is only applied as a positive (proportional) factor for exhibits considered "new", since newer items are potentially more interesting.

This factor should be relatively small, not overwhelming manually-set weightings for example, and mainly intended as a tie-breaker.

See Also:
Constant Field Values

POP_CORR_RANDOM

public static final float POP_CORR_RANDOM
Weighting/correlation of "random" component of popularity. Exists to help break ties and perturb the popularity ordering a little. Should be very small so as not to outweigh any genuine factor, but more-or-less force a total ordering.

This factor falls equally either side of zero, is a +/- limit.

See Also:
Constant Field Values

POP_CORR_HASDESC

public static final float POP_CORR_HASDESC
Weighting/correlation of has-description component of popularity. Applied to items that have a description as slightly more interesting than those without.

Should generally be greater than the randomness weighting, and probably less than the newness rating.


FACTOR_HASDESC

private static final ExhibitPropsComputableMutable.Factor FACTOR_HASDESC
Weighting factor for an item with a specific description.


FACTOR_HASGENERICDESC

private static final ExhibitPropsComputableMutable.Factor FACTOR_HASGENERICDESC
Weighting factor for an item with a generic description. Carries less weight than having a specific description.

May apply to having location or AKA or description text.


FACTOR_ZERO

public static final ExhibitPropsComputableMutable.Factor FACTOR_ZERO
Zero factor; zero weight, zero goodness.


_NOT_NEW

private static final ExhibitPropsComputableMutable.Factor _NOT_NEW
Contribution for goodness of non-new exhibits; zero goodness but the same "newness" weight.


_SAMPLE_CYCLE_PERIOD_MINOR_MS

private static final long _SAMPLE_CYCLE_PERIOD_MINOR_MS
Minimum sample period to even out short cycles in Web access patterns, in ms; strictly positive. Must usually be a multiple (and minimum) of a week to allow for most common daily and weekly access patterns to be smoothed as well as possible.

See Also:
Constant Field Values

_SAMPLE_CYCLE_PERIOD_MAJOR_MS

private static final long _SAMPLE_CYCLE_PERIOD_MAJOR_MS
The longest cycle time we will look for in historical data, in ms; strictly positive. This is usually a year to look for seasonal patterns, eg what is popular at Christmas or in the summer.

See Also:
Constant Field Values

_SEASONAL_VLONG_CYCLE_PERIODS

private static final int _SEASONAL_VLONG_CYCLE_PERIODS
Number of VLONG samples corresponding to annual cycles in data.


_MIN_VLONG_SAMPLE_PERIODS

private static final int _MIN_VLONG_SAMPLE_PERIODS
Minimum number of VLONG samples to smooth out daily/weekly cycles in data.


USE_ALL_BUCKET

private static final boolean USE_ALL_BUCKET
If true then include the "all" buckets in our calculations.

See Also:
Constant Field Values

USE_ALL_VOTES

private static final boolean USE_ALL_VOTES
If true, use ALL available votes rather than a sampling in _computeSampleBitSet(). If this is false then a caller of _computeSampleBitSet() can get a similar effect with a large minNearMs value.

See Also:
Constant Field Values

serialVersionUID

private static final long serialVersionUID
Our serial version...

See Also:
Constant Field Values
Constructor Detail

ExhibitPropsComputableMutable

private ExhibitPropsComputableMutable(long staleAfter,
                                      int goodness)
Create an instance with the specified goodness and stale 'best before' time.


ExhibitPropsComputableMutable

private ExhibitPropsComputableMutable(ExhibitStaticAttr esa,
                                      GenProps gp,
                                      AllExhibitProperties aep,
                                      BasicVarMgrInterface vars,
                                      ExhibitPropsComputableMutableVoteCacheIF voteCache,
                                      ScorerCacheIF scorers)
Create a fully populated ExhibitPropsComputableMutable object. This is given the data sources from which it can fetch the exhibit data one or more times to do its computations.

The data source references are not stored in the object.

Parameters:
esa - the exhibit to compute properties for; never null
gp - the system properties; never null
aep - the exhibit properties; can be null
vars - source of event data; can be null
scorers - source of content-based sceoring; can be null
voteCache - vote cache; can be null
Method Detail

generateFastApproximation

public static ExhibitPropsComputableMutable generateFastApproximation(ExhibitStaticAttr esa,
                                                                      GenProps gp)
Compute a "quick"/approximate value for a specified exhibit; never null. This uses only values from the exhibit static data plus the system properties object to compute a fast approximation to a full value.

The value is immediately marked as stale (unless it is possible to compute the exact value fast), but is useful to generate something quickly if the system is busy, or at start-up for example.

Parameters:
esa - the exhibit to compute properties for; never null
gp - the system properties; never null
Returns:
non-null quick approximation

compute

public static ExhibitPropsComputableMutable compute(ExhibitStaticAttr esa,
                                                    GenProps gp,
                                                    AllExhibitProperties aep,
                                                    BasicVarMgrInterface vars,
                                                    ExhibitPropsComputableMutableVoteCacheIF voteCache,
                                                    ScorerCacheIF scorers)
Compute (accurate) value for a specified exhibit; never null. This uses values from the exhibit static data plus the system properties object plus the data source (and thus might take a long time and be expensive) to compute an accurate/full value.

Providing the supplied data source is non-null and functional, and the GenProps does not appear to be empty (zero timestamp) then the value will not become stale for at least half of MAX_AGE_MS, and will become stale in no more than MAX_AGE_MS ms.

If the data source is not fully available/functional then an approximation will be returned instead, and marked stale to allow reclaculation later.

Parameters:
esa - the exhibit to compute properties for; never null
gp - the system properties; never null
aep - the exhibit properties; can be null
vars - source of event data; can be null
voteCache - vote cache; can be null
Returns:
non-null full value or approximation depending on the available data sources

calcCorrelationFactors

private static java.util.List<ExhibitPropsComputableMutable.Factor> calcCorrelationFactors(ExhibitStaticAttr esa,
                                                                                           AllExhibitProperties aep,
                                                                                           BasicVarMgrInterface vars,
                                                                                           ExhibitPropsComputableMutableVoteCacheIF voteCache)
                                                                                    throws java.io.IOException
Compute factors based on correlations. These have a confidence pre-scaled in the range 0 to POP_VOTE_CORR.

Throws:
java.io.IOException

isStale

public boolean isStale()
If true, the data in this object is stale. The data may still be correct, but should be treated as an approximation, and recomputed as soon as possible.

It is possible for a item previously non-stale that then became 'stale' to become non-stale again for a limited time while the system is conserving energy in low-power mode, thus avoing the need for some expensive recomputation. Trivially-stale instances, ie that were never fully computed, never become non-stale again.


isTriviallyStale

public boolean isTriviallyStale()
If true, the data in this object is not only stale; it is only even a fast approximation. The data is probably not worth persisting, for example, since it was not fully computed and could probably be very quickly recomputed.


bestBefore

public final long bestBefore()
Get the basic 'best-before' time. Once this has passed the data is potentially stale and isStale() will be true unless the system is in power-conserving mode, at which point isStale() can be deferred for some considerable time.

This value is useful to help schedule recomputations most-stale-first.

For trivially-stale instances this is always zero.


getGoodness

public int getGoodness()
Get the "goodness" score, -MAX_VALUE is maximally bad and MAX_VALUE maximally good, zero is neutral. This gets the value as a raw int value; useful for fast compares, sorts, etc.


getGoodnessAsFloat

public float getGoodnessAsFloat()
Get the "goodness" score as a normalised float in the range -1 (bad) to +1 (good), 0 is neutral. This is represented as a float which can lose precision/information, but may be easier to work with for display and some computations.


isGood

public java.lang.Boolean isGood()
Find out if this exhibit is rated "good"/popular or not. Returns TRUE if rated good, FALSE if bad, null if not significantly either (ie too close to neutral) or if unknown.

This is based on the computed "goodness" score.


_computeRandomGoodnessFactor

private static ExhibitPropsComputableMutable.Factor _computeRandomGoodnessFactor()
Compute random "goodness" factor symmetrical about zero; never null.


_computeNewnessBonusFactor

private static ExhibitPropsComputableMutable.Factor _computeNewnessBonusFactor(ExhibitStaticAttr esa)
Compute newness bonus (if any); never null. If not new enough to be considered "new" then a zero factor is returned.

This is not necessarily precisely aligned with the "isExhibitNew()" result, though except in race conditions they should be aligned.

Parameters:
esa - exhibit data; never null
Returns:
factor, positive but low weight if exhibit "new", else zero with zero weighting

_computeTotalGoodnessFactor

private static ExhibitPropsComputableMutable.Factor _computeTotalGoodnessFactor(ExhibitStaticAttr esa,
                                                                                GenProps gp,
                                                                                AllExhibitProperties aep,
                                                                                java.util.List<ExhibitPropsComputableMutable.Factor> initialComponents)
Compute full goodness factor, using full data source if available; never null.

Parameters:
esa - exhibit data; never null
initialComponents - initial set of (data-dependent) Factor elements; possibly empty but never null

calcAccessFactors

public static java.util.Collection<ExhibitPropsComputableMutable.Factor> calcAccessFactors(java.lang.String exhibitName,
                                                                                           AllExhibitProperties aep,
                                                                                           BasicVarMgrInterface vars)
                                                                                    throws java.io.IOException
Compute Factor(s) that depend on access data. If this cannot fetch some of the data it requires due to an IOException (not the same as fetching data successfully that has no entries) then the exception is propagated to the caller. (Normally this results in the EPCM item for which this is called being marked stale so that we can try again when the data is available.)

The comment fields of the factors produced are the event names from which they were generated.

This is public to assist testing.

Throws:
java.io.IOException - if this was unable to fetch some data required

calcVoteFactor

public static ExhibitPropsComputableMutable.Factor calcVoteFactor(java.lang.String exhibitName,
                                                                  AllExhibitProperties aep,
                                                                  BasicVarMgrInterface vars,
                                                                  long minNearMs)
                                                           throws java.io.IOException
Compute exactly one Factor that depends on explicit user votes for specified exhibit; never null. Because we expect voting to be sparse, we total up votes across all sample intervals to reduce noise, rather than taking each sample separately, and do various other calculations a little differently to dense measures.

If this cannot fetch some of the data it requires due to an IOException (not the same as fetching data successfully that has no entries) then the exception is propagated to the caller. (Normally this results in the EPCM item for which this is called being marked stale so that we can try again when the data is available.)

This is public to assist testing and computation of correlations.

The returned factor's goodness can range from -1 to +1, and confidence from 0 to +1; any scaling required will have to be applied elsewhere.

Parameters:
minNearMs - minimum interval (ms) at near end to include all available samples for
Throws:
java.io.IOException - if this was unable to fetch some data required

_computeSampleBitSet

private static java.util.BitSet _computeSampleBitSet(long minNearMs)
Compute BitSet of samples of event history to take; never null nor empty. Computes a BitSet of length up to SystemVariables.EVENT_SAMPLES_RETAINED slots, intended to be used with the EventPeriod.VLONG interval size, with index 0 indicating the previous slot (ie no use of the current period).

This includes:

Parameters:
minNearMs - if positive, the minimum number of recent ms that the sampling should cover (else an internal minimum will be imposed)
Returns:
non-null, non-empty BitSet

_chooseRandomSlot

private static void _chooseRandomSlot(java.util.Random rnd,
                                      java.util.BitSet whichIntervals,
                                      boolean forceNew)
Set a random bit in the BitSet to take a random sample. Sets a bit at random.

It is possible (though may be expensive) to insist that a new (previously false) bit is set, but only if (much) less than SystemVariables.EVENT_SAMPLES_RETAINED bits are already true.

Parameters:
rnd - random number source; never null
whichIntervals - the BitSet to set a random bit in; never null
forceNew - if true, ensure that we set a new bit ie one that was false unless at least half of all possible bits have already been set

readResolve

protected java.lang.Object readResolve()
Deserialise: use constructor for validation, defensive copying, etc. Replace all trivially-stale "empty" values with a single shared value.

NOTE: this may not always preserve all values as expected in future.


writeReplace

protected java.lang.Object writeReplace()
Serialise: replace all trivially-stale values being serialised with TRIVIAL_NEUTRAL. We save all other (non-trivially-stale) values as-is, even if stale, because they may have taken significant resources to compute and may still be much more accurate than any fast approximation we could compute.

NOTE: this is effectively a LOSSY compression mechanism.


validateObject

public void validateObject()
                    throws java.io.InvalidObjectException
Validate fields/state. Called in the constructor and possibly after de-serialising.

Barf if something bad is found. (Maybe allow some extra info in debug version.)

Specified by:
validateObject in interface java.io.ObjectInputValidation
Throws:
java.io.InvalidObjectException

DHD Multimedia Gallery V1.50.55

Copyright (c) 1996-2008, Damon Hart-Davis. All rights reserved.