|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Ads Donate Humor |
|
|
An array to calculate statisticsStatistics array. A Borland C++ class that accumulates a list of numbers or ordered pairs and can report statistics about them at any time, including regression, line of best fit, and for some purposes, chi squared. Its potential usefulness for artificial intelligence is that overloaded operators allow it to be used as a variable, a variable that has a history of its own prior values, and can thus potentially make inferences, predictions, and judgments based on them. Unfortunately, I do not yet have an application that uses it in this manner. This class is not practical to use on a large data set (thousands of points) because the processing time is prohibitive. |
|
/* stataray.h 7-1-2000
Copyright (C)1997-2001, 2006 Steven Whitney.
Initially published by http://25yearsofprogramming.com
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
Version 2 as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
------
Notes:
Should be a very handy drop-in class. Holds a 1 or 2 dimensional array of
data points, and can return statistics about the data.
Originally developed using Borland C++ 4.0, it uses Borland's iostream classes,
stream manipulators, BIDS template container class library, and complex (number) class.
It doesn't appear to contain any operating system dependent code, and should work under
MSDOS, Windows (any version), and probably any other platform.
#include this file AFTER my.h and mylib.cpp
#include only this file for the class declaration, or only stataray.cpp to pull in everything,
including this header.
SStatArray size is > 1k, so allow enough stack or allocate dynamically.
------
To Do:
--figure out quadratic (look at Excel's 6th power capability)
--maybe allow creating an array of predicted values for comparison against the real ones.
--would a sine transform catch cycles?
--there is now room to add the equation to GetCurveText, but what use without
being able to display exponents?
--a variant of this might be useful as a general purpose "state correlator &
predictor", e.g. to help determine cause & effect or probabilities of outcomes.
you'd code states numerically. Example in chess: given this board state, my
opponent usually does this... More generally, when this state exists, that one
usually follows. (see complex.doc for possible related discussion, and for
related discussion of pattern-finding: "increasing", "decreasing" can be
determined from slope, etc.).
--One note suggested a new BOOL StatusChanged(), TRUE if the last 2 Y values not the
same, for use in simulated senses for quickly determining if a change has occurred
that needs attention or processing. But may not be much use: the fact that the 2
y values are different doesn't mean they've changed since the last time you checked.
So what you really need is a flag that keeps track of when you accessed the
latest value AND whether any changes occurred since then; more trouble than it's
worth until you actually need it.
------
Notes:
--X is independent variable, Y is dependent.
--complex is used because it consists of 2 doubles (what is needed), and
because some of its member functions might turn out useful.
--functions returning complex: x result is in real, y result in imag.
--You can use either Est or Pop in calculating dCorr. The N or N-1 factors all cancel out anyway.
--8CurveFit and Auto-Curve Choice both square their correlations to get R-Squared,
probably because R2 is always positive and easier to test. Multiple regression
seems to use R-Squared, which IS r squared. But for this purpose, r is
probably correct. R squared magnifies the distance from 1.0.
--All the Y Intercept calculations look ok when compared to a graph (TIPPv35.DAT)
--For all curves, the reported A B values are the right ones to plug back into the
untransformed form of the equation
--For the few I checked in Excel, plotting raw data on graphs with transformed axes
really does turn the curve into a line.
-----
Uses:
--May be useful in applications such as Wadapt.cpp, where you want to make
data accumulation and statistical calculations an easily-accomplished
"functional subunit". When used in place of a regular variable, it gives an
agent a historical record of the values of the variable, and the inherent
ability to predict future values, detect trends, etc.
--elsewhere could eliminate the need for creating data files for export to Excel
--Although this can handle only 2 variables at a time, it may be possible to
analyze multi-variate data sets, 2 at a time, by combining 2 or more StatObjs,
or by combining their results(?).
--The purpose of the Chi Squared feature is to allow an intelligent agent to answer: "Is this
occurrence normal (statistically likely) or abnormal (statistically unlikely) and
therefore interesting and worth paying attention to?"
*/
#ifndef __STATARAY_H
#define __STATARAY_H
#include <complex.h>
#include <classlib\arrays.h>
#include "c:\bcs\my.h"
//////////////////////////////////////////////////////////////////////////////
// an SStatArray holds a data set, and can provide statistics about it.
class SStatArray : public TArrayAsVector<complex>
{
public:
SStatArray(int upper = 100, int lower = 0, int delta = 1);
SStatArray(const SStatArray&, // copy all or a subset
double = -MAXDOUBLE, double = MAXDOUBLE, // of the source's data
double = -MAXDOUBLE, double = MAXDOUBLE);
SStatArray& operator = (const SStatArray& other); // create copy
double operator = (double d); // add # to array
complex operator = (complex c); // add # pair to array
BOOL operator == (const SStatArray& other) const;
operator complex (); // allow use as a complex
operator double (); // allow use as a double
friend ostream& operator << (ostream& os, SStatArray& s); // write stats
friend istream& operator >> (istream& is, SStatArray& s); // append array data
ostream& WriteData(ostream& os); // write array data
void WriteData(const string& filename); // write array data to file
void ReadData(const string& filename); // read array data from a file
int Calc(); // force (re)calculating all statistics
int mCalc() { return(changed ? Calc() : 1); } // recalc only if data changed
// these values are always for the raw data
complex High() { mCalc(); return(dHigh); }
complex Low() { mCalc(); return(dLow); }
complex Last();
complex Range() { mCalc(); return(dRange); }
complex Median() { mCalc(); return(dMedian); }
complex Mode() { mCalc(); return(dMode); }
complex ModeFreq() { mCalc(); return(dModeFreq); }
// you usually want these for the linear model (raw data), but can get the others
// 0 means linear
complex Sum(int i = 0) { mCalc(); return(dSum[i]); }
complex Mean(int i = 0) { mCalc(); return(dMean[i]); }
complex StDevEst(int i = 0) { mCalc(); return(dStDevEst[i]); }
complex StDevPop(int i = 0) { mCalc(); return(dStDevPop[i]); }
// additional versions of the above, returning result for only X or Y
double HighX() { return(real(High())); }
double LowX() { return(real(Low())); }
double LastX() { return(real(Last())); }
double RangeX() { return(real(Range())); }
double MedianX() { return(real(Median())); }
double ModeX() { return(real(Mode())); }
double ModeXFreq() { return(real(ModeFreq())); }
double SumX(int i = 0) { return(real(Sum(i))); }
double MeanX(int i = 0) { return(real(Mean(i))); }
double StDevEstX(int i = 0) { return(real(StDevEst(i))); }
double StDevPopX(int i = 0) { return(real(StDevPop(i))); }
double HighY() { return(imag(High())); }
double LowY() { return(imag(Low())); }
double LastY() { return(imag(Last())); }
double RangeY() { return(imag(Range())); }
double MedianY() { return(imag(Median())); }
double ModeY() { return(imag(Mode())); }
double ModeYFreq() { return(imag(ModeFreq())); }
double SumY(int i = 0) { return(imag(Sum(i))); }
double MeanY(int i = 0) { return(imag(Mean(i))); }
double StDevEstY(int i = 0) { return(imag(StDevEst(i))); }
double StDevPopY(int i = 0) { return(imag(StDevPop(i))); }
// you usually want these for the BestFit model, but can get the others
// -1 means use BestFit
double Corr(int i = -1);
double Slope(int i = -1);
double YIntercept(int i = -1);
double XIntercept(int i = -1);
double PredictX(double y, int i = -1);
double PredictY(double x, int i = -1);
complex Next(int i = -1); // return prediction of the next point (x,y)
string GetCurveText(int i = -1); // provide text description of curve
int GetBestFit() { mCalc(); return(BestFit); }
// other functions (functions tailored for particular applications go here)
complex ModeYForXBetween(double lowx = -MAXDOUBLE, double highx = MAXDOUBLE);
BOOL ConvertToTimeSeries();
BOOL ConvertFromTimeSeries();
complex ChiSq();
BOOL ChiSqIsSignif(BOOL useYates = FALSE);
double GetC5();
double GetC10();
// overridden array functions that can change the array data, plus some alternate versions
// used double& variables to match existing fns? Or just unnecessary?
int Add(const complex& c);
int Add(const double& x, const double& y);
int Add(const double& y);
// special-purpose: for string description & analysis
int Add(const string& s); // adds all chars, to length()
int Add(const char* s); // adds all chars, to 0 terminator
int AddAt(const complex& t, int loc);
int AddAt(const double& x, const double& y, int loc);
int Detach(const complex& t);
int Detach(int loc);
int Detach(const double& x, const double& y);
int Destroy(const complex& t);
int Destroy(int loc);
int Destroy(const double& x, const double& y);
complex& operator [](int loc) const;
void ForEach(IterFunc iter, void *args);
complex *FirstThat(CondFunc cond, void *args);
complex *LastThat(CondFunc cond, void *args);
void Flush();
// variables
enum { LINE, LOG, PARA, HYPER, INVLINE, EXP, POW, QUAD }; // transformation types
BOOL OutputAllStats; // if FALSE, << writes only data for BestFit
BOOL changed; // array data has been changed
protected:
void Summate(int,double,double); // utility function tallies dSum, dSum2, dSumXY
int BestFit; // which curve best fits the data
// not used in regression calcs. 1 of each is sufficient. (describes raw data only)
complex dHigh;
complex dLow;
complex dRange;
complex dMedian;
complex dMode;
complex dModeFreq;
// used for regression, one for each curve.
// each array index corresponds to the data needed for one type of curve,
// so for linear you use all the [0] elements. some of the arrays contain
// duplicate data, but the ease of use is worth it.
BOOL Allow[8]; // whether calculation for this curve can be allowed
complex dSum[8]; // sums of the transformed x and y inputs
complex dSum2[8]; // sums of the squares of the transformed x and y inputs
complex dMean[8]; // means of the transformed x and y inputs
complex dStDevEst[8]; // n-1 StDev of the transformed x and y inputs
complex dStDevPop[8]; // n StDev of the transformed x and y inputs
double dSumXY[8]; // sums of the products XY (after transformation)
double dSlope[8]; // slopes of the curves (after their transformations into lines)
double dYIntercept[8]; // (transformed) Y intercepts of the transformed lines
double dCorr[8]; // correlations between X and Y arrays (used to determine best fit)
static double C5[30]; // only 1 copy of each array, used by all class instances
static double C10[30];
};
#endif // __stataray_h
/* stataray.cpp 7-1-2000
Copyright (C)1997-2000, 2006 Steven Whitney.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
Version 2 as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
------
Notes:
Code for class SStatArray
#include this file AFTER my.h and mylib.cpp
*/
#include "c:\bcs\library\stataray.h"
//////////////////////////////////////////////////////////////////////////////
//----------------------------------------------------------------------------
// External declarations and initializations for these 2 big arrays.
// comparison values for Chi-Squared testing when degrees of freedom <= 30.
//
// C5: Q(x) = 5% If your X is ABOVE this, it is significantly different from
// the normally distributed population
//
// After testing to compare with book examples, should use C10,
// except I don't know the formulas to calculate C10 when dF > 30, etc.
double SStatArray::C5[30] =
{
3.841455338, 5.991476357, 7.814724703, 9.487728465, 11.07048257, 12.59157742,
14.06712726, 15.50731249, 16.91896016, 18.30702904, 19.67515307, 21.02605538,
22.36202656, 23.68478234, 24.99579669, 26.29622094, 27.58710028, 28.86932099,
30.14350506, 31.41042038, 32.67055801, 33.92445983, 35.17246022, 36.41502646,
37.6524894 , 38.88512964, 40.11326562, 41.33715169, 42.55694751, 43.77295389
};
//----------------------------------------------------------------------------
// C10: Q(x) = 10% If your X is ABOVE this, it is significantly different from
// the normally distributed population
double SStatArray::C10[30] =
{
2.705540585, 4.605176125, 6.251394452, 7.779433964, 9.236349099, 10.64463748,
12.01703137, 13.36156189, 14.68366318, 15.98717471, 17.2750067 , 18.54934022,
19.81193273, 21.0641406 , 22.30712056, 23.54182147, 24.76902818, 25.9894184 ,
27.20356481, 28.41196991, 29.61508586, 30.8132853 , 32.00689016, 33.19623509,
34.38158333, 35.56316367, 36.74122756, 37.91590744, 39.08747534, 40.25601698
};
//----------------------------------------------------------------------------
// constructor
SStatArray::SStatArray(int upper, int lower, int delta)
: TArrayAsVector<complex>(upper,lower,delta)
{
changed = TRUE;
OutputAllStats = TRUE;
} // constructor
//----------------------------------------------------------------------------
// copy constructor
// you can use it to copy all or only a subset of the source object's data
SStatArray::SStatArray(const SStatArray& other, double lowx, double highx,
double lowy, double highy)
: TArrayAsVector<complex>(other.GetItemsInContainer(),0,1)
{
changed = TRUE;
OutputAllStats = other.OutputAllStats;
if(lowx > highx) swap(lowx,highx);
if(lowy > highy) swap(lowy,highy);
int N = other.GetItemsInContainer();
for(int i = 0 ; i < N ; i++)
{
double x = real(other[i]), y = imag(other[i]);
// test INCLUDES both endpoints.
// And the point is only copied if BOTH its X and Y values are within their ranges.
if((x >= lowx) && (x <= highx) && (y >= lowy) && (y <= highy))
Add(other[i]);
}
} // copy constructor
//----------------------------------------------------------------------------
BOOL SStatArray::operator == (const SStatArray& other) const
{
return(&other == this);
}
//----------------------------------------------------------------------------
// assignment operator creates an exact copy
SStatArray& SStatArray::operator = (const SStatArray& other)
{
if(this == &other)
return(*this);
changed = TRUE;
OutputAllStats = other.OutputAllStats;
for(int i = 0 ; i < other.GetItemsInContainer() ; i++)
Add(other[i]);
return(*this);
} // assignment operator
//----------------------------------------------------------------------------
// return the last array item
complex SStatArray::Last()
{
int N = GetItemsInContainer();
if(!N)
return(complex(0,0)); // if empty, return 0
return((*this)[N - 1]); // return the last Y value
}
//----------------------------------------------------------------------------
// add a new value (dependent variable y) to the array. This makes it possible
// to treat the array as an ordinary double variable, but one with a historical record.
// That is, SStatArray s = 24.5 adds 24.5 to the array.
// returns the exact value just added, so it can be used in compound expressions,
// same as assignment of any other numerical type.
double SStatArray::operator = (double d)
{
Add(d);
return(d); // exact passthrough, not (double)(*this), which could change
}
//----------------------------------------------------------------------------
// add a new data pair to the array as a complex. See above.
complex SStatArray::operator = (complex c)
{
Add(c);
return(c); // exact passthrough
}
//----------------------------------------------------------------------------
// allow use as a double.
// The value of the array is that of its most recently added Y (dependent) value.
// It could return alternative values: e.g. its mean, etc.,
// depending on the setting of a member flag.
// To use in some math operations, you must explicitly cast it to (double).
SStatArray::operator double ()
{
return(LastY()); // return the last Y value
}
//----------------------------------------------------------------------------
// allow use as a complex.
// removing this doesn't prevent compiler finding ambiguities in math operations.
SStatArray::operator complex ()
{
return(Last()); // return the latest array item
}
//----------------------------------------------------------------------------
// output stats to a stream
ostream& operator << (ostream& os, SStatArray& s)
{
s.mCalc();
os << "Count " << s.GetItemsInContainer() << endl;
os << "High " << s.High() << endl;
os << "Low " << s.Low() << endl;
os << "Range " << s.Range() << endl;
os << "Sum " << s.Sum(0) << endl;
os << "Mean " << s.Mean(0) << endl;
os << "Median " << s.Median() << endl;
os << "Mode " << s.Mode() << endl;
os << "Mode Freq. " << s.ModeFreq() << endl;
os << "StDevPop " << s.StDevPop(0) << endl;
os << "StDevEst " << s.StDevEst(0) << endl;
os << endl;
os << setiosflags(ios::left) << " " << setw(20) << "Curve (+Best)";
os << setw(15) << "Slope(m or B)";
os << setw(15) << "Y-Int(b or A)" << setw(15) << "Correlation" << endl;
for(int i = (s.OutputAllStats ? 0 : s.BestFit) ; i < 8 ; i++)
{
if(!s.Allow[i]) // don't output transforms that weren't used
continue;
os << ((i == s.BestFit) ? "+" : " ");
os << setw(20) << (tostring(i) + ": " + s.GetCurveText(i));
os << setw(15) << s.Slope(i);
os << setw(15) << s.YIntercept(i);
os << s.Corr(i); // setw() is for the next << operation. For complex, it's the "("!
os << endl;
if(!s.OutputAllStats) // if only BestFit, quit after 1 pass
break;
}
// old version, vertical
// for(int i = (s.OutputAllStats ? 0 : s.BestFit) ; i < 8 ; i++)
// {
// if(!s.Allow[i]) // don't output transforms that weren't used
// continue;
// if(i == s.BestFit)
// os << "=== Best Fit ===" << endl;
// os << "Curve type " << i << ": " << s.GetCurveText(i) << endl;
// os << "Slope " << s.dSlope[i] << endl;
// os << "Y Intercept " << s.dYIntercept[i] << endl;
// os << "Corr " << s.dCorr[i] << endl;
// os << endl;
// if(!s.OutputAllStats) // if only BestFit, quit after 1 pass
// break;
// }
return(os);
}
//----------------------------------------------------------------------------
// write array data only to a stream
ostream& SStatArray::WriteData(ostream& os)
{
for(int i = 0 ; i < GetItemsInContainer() ; i++)
os << real((*this)[i]) << " " << imag((*this)[i]) << endl;
return(os);
}
//----------------------------------------------------------------------------
// write array data only to a file
void SStatArray::WriteData(const string& filename)
{
ofstream outfile(filename.c_str());
WriteData(outfile);
}
//----------------------------------------------------------------------------
// load array data from a stream.
// can read either 1 or 2 column input. (any additional columns are ignored)
// any column can have 1 or 2 items. See rules under Add().
// you must read all the data in one call, to prevent merging data from
// multiple files with different formats.
istream& operator >> (istream& is, SStatArray& s)
{
s.Flush(); // disallow appending data
double x, y;
char buf[128];
is >> ws; // ensure first getline has something to read
while(is.getline(buf,sizeof(buf)))
{
istrstream str(buf);
str >> x >> ws;
if(str) // if there is anything left, it is Y
{
str >> y;
s.Add(x,y);
}
else
s.Add(x); // this x is really the y variable
is >> ws;
}
return(is);
} // operator >>
//----------------------------------------------------------------------------
// read array data from a file
void SStatArray::ReadData(const string& filename)
{
ifstream(filename.c_str()) >> *this;
}
//----------------------------------------------------------------------------
// tabulates given numbers into sum sumofsquares, etc.
void SStatArray::Summate(int i, double tx, double ty)
{
dSum[i] += complex(tx,ty);
dSum2[i] += complex(tx * tx, ty * ty);
dSumXY[i] += tx * ty;
}
//----------------------------------------------------------------------------
// calculate all statistics
// returns: 0 if calculations couldn't be done (no elements)
// 1 if OK
// (this function must not use any of the other calculation functions, because
// they all mCalc, and changed isn't FALSE until THIS Calc() is done. (endless loop))
int SStatArray::Calc()
{
// reset all variables in any event
dHigh = dLow = dRange = dMedian = dMode = dModeFreq = complex(0,0);
for(int i = 0 ; i < 8 ; i++)
{
dSum[i] = dSum2[i] = dMean[i] = dStDevEst[i] = dStDevPop[i] = complex(0,0);
dSumXY[i] = dSlope[i] = dYIntercept[i] = dCorr[i] = 0.;
Allow[i] = TRUE;
}
Allow[QUAD] = FALSE; // can't figure out how to do it
// now see if there is any data to operate on
int N = GetItemsInContainer();
if(!N)
return(0);
// High, Low, Sum, Range, Mean
dHigh = complex(-2e300,-2e300);
dLow = complex(2e300,2e300);
for(i = 0 ; i < N ; i++)
{
double x = real((*this)[i]), y = imag((*this)[i]);
dHigh = complex(max(real(dHigh),x), max(imag(dHigh),y));
dLow = complex(min(real(dLow),x), min(imag(dLow),y));
if(x == 0) Allow[HYPER] = FALSE;
if(y == 0) Allow[INVLINE] = FALSE;
if(x <= 0) Allow[LOG] = Allow[POW] = FALSE;
if(y <= 0) Allow[EXP] = Allow[POW] = FALSE;
// each summation receives data until a data point makes it no longer allowable
if(Allow[LINE]) Summate(LINE, x, y); // no adjustment
if(Allow[LOG]) Summate(LOG, log(x), y); // TI59:SA:84
if(Allow[PARA]) Summate(PARA, x * x, y); // a guess
if(Allow[HYPER]) Summate(HYPER, 1 / x, y); // a guess
if(Allow[INVLINE]) Summate(INVLINE, x, 1 / y); // a guess
if(Allow[EXP]) Summate(EXP, x, log(y)); // TI59:PP:v39
if(Allow[POW]) Summate(POW, log(x), log(y)); // TI59:SA:84
if(Allow[QUAD]) Summate(QUAD, x, y); // no idea
}
dRange = dHigh - dLow;
for(i = 0 ; i < 8 ; i++)
dMean[i] = dSum[i] / N;
// Median, Mode
// Mode could be erratic for double values due to no tolerance in (double) ==
// two virtually identical numbers can be found unequal.
// could add TOLERANCE, which might also allow in effect grouping data
int center = N / 2;
TSArrayAsVector<double> s(N,0,0); // sorted array automatically orders the members
// Median x: find halfway point in x portion
for(i = 0 ; i < N ; i++)
s.Add(real((*this)[i]));
// if even # of elements, interpolate to get a "center" point
// (if N is 1 (and thus center is 0), s[center-1] below won't be calculated)
double mx = ((N % 2) == 1) ? s[center] : (s[center - 1] + s[center]) / 2;
// Mode x
double modex = s[0]; // start with first as the most common
int tx = 1; // and it occurs once
double thisnum = s[0]; // each new number encountered
int tally = 1; // temporary counter for numbers encountered
for(i = 1 ; i < N ; i++)
{
if(s[i] == thisnum) // if it's another occurrence,
tally++; // just increment the counter
else // else if we hit a new #
{
if(tally > tx) // first check if the previous one should be the new mode
{
modex = thisnum; // thisnum is now the previous one
tx = tally; // and update the highest count counter
}
thisnum = s[i]; // now start tallying the new number
tally = 1; // it has already occurred once
}
}
if(tally > tx) // final check: maybe the last # was the most common
{
modex = thisnum;
tx = tally;
}
// Median y: find halfway point in y portion
s.Flush();
for(i = 0 ; i < N ; i++)
s.Add(imag((*this)[i]));
double my = ((N % 2) == 1) ? s[center] : (s[center - 1] + s[center]) / 2;
dMedian = complex(mx,my);
// Mode y
double modey = s[0]; // start with first as the most common
int ty = 1; // and it occurs once
thisnum = s[0]; // each new number encountered
tally = 1; // temporary counter for numbers encountered
for(i = 1 ; i < N ; i++)
{
if(s[i] == thisnum) // if it's another occurrence,
tally++; // just increment the counter
else // else if we hit a new #
{
if(tally > ty) // first check if the previous one should be the new mode
{
modey = thisnum; // thisnum is now the previous one
ty = tally; // and update the highest count counter
}
thisnum = s[i]; // now start tallying the new number
tally = 1; // it has already occurred once
}
}
if(tally > ty) // final check: maybe the last # was the most common
{
modey = thisnum;
ty = tally;
}
// if each # only occurred once, the array doesn't have a mode: ModeFreq allows checking
// if(tx == 1) modex = 0; // 0 previously used as a warning value
// if(ty == 1) modey = 0;
dMode = complex(modex,modey); // now if x or y modeless, it has its s[0] (lowest) value
dModeFreq = complex(tx,ty); // so you should check ModeFreq to make sure > 1
s.Flush(); // frees space if it's large
// Standard Deviations and Regressions
for(i = 0 ; i < 8 ; i++)
{
if(!Allow[i]) // is this transformation legal?
continue;
// StDevEst, StDevPop
// the operand of sqrt won't ever be negative, but can be zero
double b = (N * real(dSum2[i])) - (real(dSum[i]) * real(dSum[i]));
double c = (N * imag(dSum2[i])) - (imag(dSum[i]) * imag(dSum[i]));
// in practice, when they should be 0, they can be tiny negative fractions
if(b < 0) b = 0;
if(c < 0) c = 0;
b = sqrt(b);
c = sqrt(c);
dStDevEst[i] = dStDevPop[i] = complex(b/N, c/N); // Pop calculation is always valid
if(N > 1) // prevent divide by zero if N == 1
dStDevEst[i] = complex(b/sqrt(N * (N - 1)), c/sqrt(N * (N - 1)));
// linear regression
// look for version that minimizes chances of overflow, divide by zero, rounding errors,
// works best for the multi-curve fitting, either resolves Est vs. Pop question
// or makes it irrelevant.
// this set from TI59 book (use of dMean from Excel, Excel otherwise similar)
// it correctly calculates YInt for 10,20,30 as 0
double a = dSumXY[i] - real(dSum[i]) * imag(dSum[i]) / N;
// we're really ensuring b and c don't equal 0
if(b > 0.)
{
dSlope[i] = a / (real(dSum2[i]) - real(dSum[i]) * real(dSum[i]) / N);
dYIntercept[i] = imag(dMean[i]) - dSlope[i] * real(dMean[i]);
}
// for (c==0) horizontal line, dCorr[i] is left at 0, which is correct for the
// correlation between X and Y, but it may not mean that all the points fall
// exactly ON the line (?); can they cloud around it? What is the measure of that?
// probably just the mean, stdev, etc.
if(c > 0.)
dCorr[i] = dSlope[i] * real(dStDevPop[i]) / imag(dStDevPop[i]);
// this set probably from stats books, probably carefully checked, but check again.
// it calculates YInt for 10,20,30 as ...e-14, and uses StDev a lot. (est vs. pop?)
// if((b > 0.) && (c > 0.))
// {
// double a = (N * dSumXY[0]) - (real(dSum[0]) * imag(dSum[0]));
// dCorr[0] = a / (b * c);
// dSlope[0] = dCorr[0] * imag(dStDevPop[0]) / real(dStDevPop[0]);
// // why is this formula so much different from PredictY (and so complicated?)
// dYIntercept[0] = (-dCorr[0] * imag(dStDevPop[0]) * real(dMean[0]) / real(dStDevPop[0])) + imag(dMean[0]);
// }
}
// determine best fit (correlation nearest to 1 or -1)
BestFit = 0;
double compare = 0;
for(i = 0 ; i < 8 ; i++)
if(Allow[i])
if(fabs(dCorr[i]) > fabs(compare))
{
compare = dCorr[i];
BestFit = i;
}
changed = FALSE;
return(1);
} // Calc
//----------------------------------------------------------------------------
double SStatArray::Corr(int i)
{
mCalc();
if(i == -1)
i = BestFit;
if(!Allow[i])
return(0);
return(dCorr[i]);
} // Corr
//----------------------------------------------------------------------------
// slope must be adjusted by regression type (or must it? none of them seem to be)
// see TI59:SA:83 (Texas Instruments TI59 Programmable Calculator, Statistical Analysis module, p.83)
// but what use is the actual value if nonlinear?
double SStatArray::Slope(int i)
{
mCalc();
if(i == -1)
i = BestFit;
if(!Allow[i])
return(0);
double m = dSlope[i];
switch(i)
{
case LINE : break;
case LOG : break; // TI59:SA:84
case PARA : break; // a guess
case HYPER : break; // a guess
case INVLINE: break; // a guess
case EXP : break; // TI59:SA:83
case POW : break; // TI59:SA:84
case QUAD : break; // no idea
}
return(m);
} // Slope
//----------------------------------------------------------------------------
// returns the de-transformed YIntercept value: this is the value you need
// to know to use the exponentiated version of the equation.
// (as opposed to the transposed-to-linear version, which the program MUST use internally)
double SStatArray::YIntercept(int i)
{
mCalc();
if(i == -1)
i = BestFit;
if(!Allow[i])
return(0);
double b = dYIntercept[i];
switch(i) // adjust if entry changes were made to Y (because b is in Y axis units?)
{
case LINE : break;
case LOG : break; // TI59:SA:84
case PARA : break; // no adj: tested against Excel graph
case HYPER : break; // no adj: tested against Excel graph
case INVLINE: break; // no adj: m,b describe an x-axis point for INVLINE
case EXP : b = exp(b); break; // TI59:SA:83
case POW : b = exp(b); break; // TI59:SA:84
case QUAD : break; // no idea
}
return(b);
} // YIntercept
//----------------------------------------------------------------------------
double SStatArray::XIntercept(int i)
{
// mCalc(); // PredictX does all this
// if(i == -1)
// i = BestFit;
double d = PredictX(0, i); // PredictX checks for !Allow[i]
return(d);
}
//----------------------------------------------------------------------------
double SStatArray::PredictX(double y, int i)
{
mCalc();
if(i == -1)
i = BestFit;
if(!Allow[i])
return(0);
// even if the transformation is legal based on the data IN the array,
// it may be illegal to transform the new Y value provided:
if((y == 0) && (i == INVLINE)) return(0);
if((y <= 0) && ((i == EXP) || (i == POW))) return(0);
// transform y before using it, in the same manner it was transformed during data entry
switch(i)
{
case LINE : break;
case LOG : break; // TI59:SA:84
case PARA : break; // a guess
case HYPER : break; // a guess
case INVLINE: y = 1 / y; break; // a guess
case EXP : y = log(y); break; // TI59:PP:v39
case POW : y = log(y); break; // TI59:SA:84
case QUAD : break; // no idea
}
double x = (y - dYIntercept[i]) / dSlope[i];
// x may be valid, but its upcoming transformation may cause a math error, so:
if((x == 0) && (i == HYPER)) return(0);
if((x < 0) && (i == PARA)) return(0);
// re-transform result to raw data (if you transformed X upon data entry?)
switch(i)
{
case LINE : break;
case LOG : x = exp(x); break; // TI59:SA:84
case PARA : x = sqrt(x); break; // a guess
case HYPER : x = 1 / x; break; // a guess
case INVLINE: break; // a guess
case EXP : break; // no adj, TI59:PP:v39
case POW : x = exp(x); break; // TI59:SA:84
case QUAD : x = x; break; // no idea, probably does need adjusting
}
return(x);
} // PredictX
//----------------------------------------------------------------------------
double SStatArray::PredictY(double x, int i)
{
mCalc();
if(i == -1)
i = BestFit;
if(!Allow[i])
return(0);
// even if the transformation is legal based on the data IN the array,
// it may be illegal to transform the new X value provided:
if((x == 0) && (i == HYPER)) return(0);
if((x <= 0) && ((i == LOG) || (i == POW))) return(0);
// transform x before using it in the same manner x was transformed upon data entry
switch(i)
{
case LINE : break;
case LOG : x = log(x); break; // TI59:SA:84
case PARA : x = x * x; break; // a guess
case HYPER : x = 1 / x; break; // a guess
case INVLINE: break; // a guess
case EXP : break; // no adj, TI59:PP:v39
case POW : x = log(x); break; // TI59:SA:84
case QUAD : x = x; break; // no idea, probably does need adjusting
}
double y = dSlope[i] * x + dYIntercept[i];
// y may be valid, but its upcoming transformation may cause a math error, so:
if((y == 0) && (i == INVLINE)) return(0);
// un-transform the result (with the inverse operation done on Y upon data entry)
switch(i)
{
case LINE : break;
case LOG : break; // TI59:SA:84
case PARA : break; // a guess
case HYPER : break; // a guess
case INVLINE: y = 1 / y; break; // a guess
case EXP : y = exp(y); break; // TI59:PP:v39
case POW : y = exp(y); break; // TI59:SA:84
case QUAD : break; // no idea
}
return(y);
} // PredictY
//----------------------------------------------------------------------------
// provide text description of best fit (or any requested) curve
string SStatArray::GetCurveText(int i)
{
mCalc();
if(i == -1)
i = BestFit;
switch(i)
{
case LINE : return("Linear");
case LOG : return("Logarithmic");
case PARA : return("Parabolic");
case HYPER : return("Hyperbolic");
case INVLINE: return("Inverse Linear");
case EXP : return("Exponential");
case POW : return("Power");
case QUAD : return("Quadratic");
}
return("Illegal Curve Number");
}
//----------------------------------------------------------------------------
// return prediction of the next point (x,y) in the sequence:
// next x = lastX + XRange / (N-1) (assumes equal intervals between X values)
// next y = PredictY(x)
// main intended use is for Adapt or similar agents.
complex SStatArray::Next(int i) // you can specify which model to use
{
int N = GetItemsInContainer();
if(N < 2) // to predict, you need 2 points, & prevent div by 0 below
return(complex(0,0));
mCalc();
if(i == -1)
i = BestFit;
// could use dRange[BestFit], then untransform it (?? but dRange is not an array)
double nextx = real((*this)[N - 1]) + real(dRange) / (N - 1);
double nexty = PredictY(nextx, i);
return(complex(nextx,nexty));
} // Next
//----------------------------------------------------------------------------
// find the value of Y most commonly paired with a particular X (when lowx == highx),
// or with a particular range of X values.
// The expected use for this is with time series, or where the values in your
// arrays are actually codes corresponding to consecutive occurrences or states,
// and where you want to know, for example, "When state X occurred, what was the
// state Y that most often followed it?" Once you have the answer, this is a
// situation where you could use a Chi-squared test to determine if the relationship
// is actually significant and you should pay attention to it. (Although in most
// cases, even if it isn't, it still has to be your best guess.)
//
// returns a complex: mode is in real(), its frequency in imag()
// (unlike the others to allow calling the function only once, because it can't just
// return a data member)
//
// Leave this in! You can use the copy constructor to create a subset, but THAT
// subset doesn't contain the information about the other Y values that exist
// (they were excluded), so you can't get expected frequencies for Chi-squared.
// That is, to calc chi squared, you must know how many Y values there are,
// so you can assign equal expected probability to each.
complex SStatArray::ModeYForXBetween(double lowx, double highx)
{
int N = GetItemsInContainer();
if(!N)
return(complex(0,0));
if(lowx > highx)
swap(lowx,highx);
TSArrayAsVector<double> s(N,0,0); // sorted array automatically orders the members
for(int i = 0 ; i < N ; i++)
{
double x = real((*this)[i]);
if((x >= lowx) && (x <= highx)) // test INCLUDES both endpoints
s.Add(imag((*this)[i]));
}
N = s.GetItemsInContainer();
if(!N) // none met the criteria
return(complex(0,0));
double modey = s[0]; // start with first as the most common
int ty = 1; // and it occurs once
double thisnum = s[0]; // each new number encountered
int tally = 1; // temporary counter for numbers encountered
for(i = 1 ; i < N ; i++)
{
if(s[i] == thisnum) // if it's another occurrence,
tally++; // just increment the counter
else // else if we hit a new #
{
if(tally > ty) // first check if the previous one should be the new mode
{
modey = thisnum; // thisnum is now the previous one
ty = tally; // and update the highest count counter
}
thisnum = s[i]; // now start tallying the new number
tally = 1; // it has already occurred once
}
}
if(tally > ty) // final check: maybe the last # was the most common
{
modey = thisnum;
ty = tally;
}
return(complex(modey,ty));
} // ModeYForXBetween
//----------------------------------------------------------------------------
// Convert 1-dimensional data to a time series, such that for each point
// Y is moved into X, and Y becomes the value that followed it in the series.
// (1,10)(2,20)(3,30) becomes (10,20)(20,30). (result set has 1 fewer points)
// No idea what this might be used for, but might be a handy way to get such
// data pairs prepared for transfer to Excel, or be useful if used in a program
// where plotting capabilities are available.
// Like with the ChiSq application, this uses the array differently from normal.
// If you subsequently use the statistical functions, be sure you know what you're doing.
// (since the data was moved around)
// (untested)
BOOL SStatArray::ConvertToTimeSeries()
{
int N = GetItemsInContainer();
if(N < 2) // minimum 2 pairs to have 1 resulting pair
return(FALSE);
int last = N - 1;
for(int i = 0 ; i < last ; i++)
(*this)[i] = complex(imag((*this)[i]),real((*this)[i + 1]));
Destroy(last); // last point has no next (and will set changed to TRUE)
return(TRUE);
} // ConvertToTimeSeries
//----------------------------------------------------------------------------
// Does the best it can to reconvert a time series back to normal form.
// Uses 1,2,3... for the X values. (10,20)(20,30) becomes (1,10)(2,20)(3,30)
// (untested)
BOOL SStatArray::ConvertFromTimeSeries()
{
int N = GetItemsInContainer();
if(!N) // no data
return(FALSE);
double lasty;
for(int i = 0 ; i < N ; i++)
{
lasty = imag((*this)[i]);
(*this)[i] = complex(i + 1, real((*this)[i]));
}
Add(lasty); // restore a point for the final value (and will set changed to TRUE)
return(TRUE);
} // ConvertFromTimeSeries
//----------------------------------------------------------------------------
// Chi squared calculations use the array differently from the others.
// Array should contain expected frequencies in X (real), observed frequencies in Y (imag)
// returns the chi squared values, real = without Yates correction, imag = with Yates.
// to be valid, all expected values should be >= 5, and MUST be > 0.
complex SStatArray::ChiSq()
{
int N = GetItemsInContainer();
complex chisq = complex(0,0);
for(int i = 0 ; i < N ; i++)
{
if(real((*this)[i]) == 0) // prevent divide by 0
continue;
double noyates = imag((*this)[i]) - real((*this)[i]);
double yates = fabs(imag((*this)[i]) - real((*this)[i])) - .5;
chisq += complex((noyates * noyates) / real((*this)[i]), (yates * yates) / real((*this)[i]));
}
return(chisq);
} // ChiSq
//----------------------------------------------------------------------------
// performs a Chi squared test on the array.
// as noted under ModeYForXBetween, even if your best guess isn't significant,
// you may still have to use it. But if you are tracking relationships and memory
// is close to full, you could use a Chi squared test to choose which relationships
// to stop tracking, and delete.
//
// returns TRUE if array is NOT normally distributed (IS significantly different from expected)
// FALSE if array matches expected frequencies (is NOT significantly different)
BOOL SStatArray::ChiSqIsSignif(BOOL useYates)
{
int N = GetItemsInContainer();
if(N < 2) // minimum 2 pairs so V >= 1. V=0 is illegal.
return(FALSE);
double chisq = useYates ? imag(ChiSq()) : real(ChiSq());
double criticalvalue = GetC5();
return(chisq > criticalvalue);
} // ChiSqIsSignif
//----------------------------------------------------------------------------
// returns the 5% critical value for the # of items in the array.
double SStatArray::GetC5()
{
int N = GetItemsInContainer();
if(N < 2) // minimum 2 pairs so V >= 1. V=0 is illegal.
return(MAXDOUBLE); // an obvious error condition value
double criticalvalue;
int dF = N - 1; // degrees of freedom
if(dF <= 30)
criticalvalue = C5[dF - 1]; // dFs start at 1, array indices start at 0
else
if(dF > 100)
criticalvalue = .5 * (1.6449 + sqrt(2 * dF - 1)) * (1.6449 + sqrt(2 * dF - 1));
else
if(dF == 100)
criticalvalue = 124.3421013;
else // > 30 and < 100
criticalvalue = dF * pow(1 - 2/(9 * dF) + 1.6449 * sqrt(2/(9 * dF)),3);
return(criticalvalue);
} // GetC5
//----------------------------------------------------------------------------
// returns the 10% critical value for the # of items in the array.
double SStatArray::GetC10()
{
int N = GetItemsInContainer();
if(N < 2) // minimum 2 pairs so V >= 1. V=0 is illegal.
return(MAXDOUBLE); // an obvious error condition value
double criticalvalue;
int dF = N - 1; // degrees of freedom
if(dF <= 30)
criticalvalue = C10[dF - 1]; // dFs start at 1, array indices start at 0
else
if(dF > 100)
criticalvalue = MAXDOUBLE; // error: calculation unknown
else
if(dF == 100)
criticalvalue = 118.4980018;
else // > 30 and < 100
criticalvalue = MAXDOUBLE; // error: calculation unknown
return(criticalvalue);
} // GetC10
//----------------------------------------------------------------------------
// Array functions
//----------------------------------------------------------------------------
int SStatArray::Add(const complex& c)
{
changed = TRUE;
return(TArrayAsVector<complex>::Add(c));
}
//----------------------------------------------------------------------------
// allow providing x and y as separate doubles
int SStatArray::Add(const double& x, const double& y)
{
return(Add(complex(x,y)));
}
//----------------------------------------------------------------------------
// if you only provide one value, it is the dependent variable, stored in y.
// x is an automatic counter, starting at 1 (not 0) to allow multi-curve fitting.
int SStatArray::Add(const double& y)
{
int x;
int N = GetItemsInContainer();
// original version, x just counts elements
// x = N + 1;
// another prior version: TI59 sets the counter to the previous X value + 1.
// more flexible sequencing. if no elements, start with 1
// x = N ? real((*this)[N - 1]) + 1 : 1;
// latest version:
// if x is regularly spaced, you need only enter enough x values to establish the interval
// you can change the interval at any point by entering enough x values to establish it
switch(N)
{
case 0: x = 1; break; // no info available, start at 1
case 1: x = real((*this)[0]) + 1; break; // no interval, assume + 1
default: // use previous interval
x = real((*this)[N - 1]) + real((*this)[N - 1]) - real((*this)[N - 2]); break;
}
return(Add(complex(x,y)));
}
//----------------------------------------------------------------------------
// adds each char, to s.length() (including null chars)
// allows calculating stats on a string. will be used for wtalk.cpp, bongard string descriptions
int SStatArray::Add(const string& s)
{
for(int i = 0 ; i < s.length() ; i++)
Add((double)s[i]);
return(1); // ?
}
//----------------------------------------------------------------------------
// adds each char up to first null
int SStatArray::Add(const char* s)
{
if(!s)
return(0);
for( ; *s ; s++)
Add((double)(*s));
return(1); // ?
}
//----------------------------------------------------------------------------
int SStatArray::AddAt(const complex& t, int loc)
{
changed = TRUE;
return(TArrayAsVector<complex>::AddAt(t, loc));
}
//----------------------------------------------------------------------------
int SStatArray::AddAt(const double& x, const double& y, int loc)
{
return(AddAt(complex(x,y), loc));
}
//----------------------------------------------------------------------------
int SStatArray::Detach(const complex& t)
{
changed = TRUE;
return(TArrayAsVector<complex>::Detach(t));
}
//----------------------------------------------------------------------------
int SStatArray::Detach(const double& x, const double& y)
{
return(Detach(complex(x,y)));
}
//----------------------------------------------------------------------------
int SStatArray::Detach(int loc)
{
changed = TRUE;
return(TArrayAsVector<complex>::Detach(loc));
}
//----------------------------------------------------------------------------
int SStatArray::Destroy(const complex& t)
{
changed = TRUE;
return(TArrayAsVector<complex>::Destroy(t));
}
//----------------------------------------------------------------------------
int SStatArray::Destroy(const double& x, const double& y)
{
return(Destroy(complex(x,y)));
}
//----------------------------------------------------------------------------
int SStatArray::Destroy(int loc)
{
changed = TRUE;
return(TArrayAsVector<complex>::Destroy(loc));
}
//----------------------------------------------------------------------------
complex& SStatArray::operator [](int loc) const
{
// old: under normal circumstances, you probably wouldn't access an individual data point.
// this assumes that if you DO, you might intend to change it. It can't know if you do
// or not, so changed = TRUE. new: But you can't modify a const object, and this isn't that
// important. If you DO change it, YOU should set changed at that time, anyway.
// changed = TRUE;
return(TArrayAsVector<complex>::operator [](loc));
}
//----------------------------------------------------------------------------
void SStatArray::ForEach(IterFunc iter, void *args)
{
changed = TRUE;
TArrayAsVector<complex>::ForEach(iter, args);
}
//----------------------------------------------------------------------------
complex * SStatArray::FirstThat(CondFunc cond, void *args)
{
changed = TRUE;
return(TArrayAsVector<complex>::FirstThat(cond, args));
}
//----------------------------------------------------------------------------
complex * SStatArray::LastThat(CondFunc cond, void *args)
{
changed = TRUE;
return(TArrayAsVector<complex>::LastThat(cond, args));
}
//----------------------------------------------------------------------------
void SStatArray::Flush()
{
changed = TRUE;
TArrayAsVector<complex>::Flush();
}
//----------------------------------------------------------------------------
// end class SStatArray
//////////////////////////////////////////////////////////////////////////////
/* stats.cpp 1-28-99
Copyright (C)1997-99 Steven Whitney.
Published under GNU GPL (General Public License) Version 2, with ABSOLUTELY NO WARRANTY.
This program uses a SStatArray to calculate statistics about data in a file (test.dat).
It was used for developing and testing the SStatArray class. It could eventually replace
my stats.c, but what stats.c and this do can be done much better in an Excel worksheet.
Developed in Borland C++ 4.0. Platform MSDOS.
*/
#include "c:\bcs\my.h"
#pragma hdrstop
#include "c:\bcs\mylib.cpp"
#include "c:\bcs\library\stataray.cpp"
extern unsigned _stklen = 64000U;
//////////////////////////////////////////////////////////////////////////////
// main
int main(int argc, char**argv)
{
string::set_case_sensitive(0);
// string::set_paranoid_check(1);
string filename("test.dat");
if(argc > 1)
filename = argv[1];
SStatArray s;
s.ReadData(filename);
s.WriteData(cout);
cout << endl;
cout << s << endl;
// section used for testing.
// for use with the various test problems that call for predictions
// double py = 2;
// double px = 9.93;
// cout << s.PredictY(py,0) << " " << s.PredictY(py) << endl;
// cout << s.PredictX(px,0) << " " << s.PredictX(px) << endl;
// cout << "Chi Squared: " << s.ChiSq() << endl;
// cout << "C5: " << s.GetC5() << endl;
// cout << "C10: " << s.GetC10() << endl;
// cout << (s.ChiSqIsSignif() ? "NOT NORMAL" : "NORMAL") << endl;
s = 330;
cout << (double)s << endl;
cout << s.Last() << endl;
double t = (double)s * 10;
cout << t << endl;
presskey();
return(0);
} // end mainAn old, smaller program that reports statistics about data in a file given to
it.
Originally written in DeSmet C. Converting it back to DeSmet should be easy.
/* stats.c BORLAND VERSION 12-6-94
Copyright (C)1994 Steven Whitney.
Published under GNU GPL (General Public License) Version 2, with ABSOLUTELY NO WARRANTY.
Initially published by http://25yearsofprogramming.com.
Reads data from a file and reports statistics about the figures.
Not worth rewriting, but keep as a reminder how the various statistics are calculated.
Eventually use this as a guide for an Excel worksheet that performs the same calculations.
Originally written in DeSmet C; converting it back to DeSmet should be easy.
*/
/* some of these headers might not be needed */
#include <stdio.h>
#include <stddef.h>
#include <dos.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
#include <time.h>
#include <ctype.h>
#include <math.h>
#include "my.h"
void main(int argc, char **argv)
{
int columns; /* number of columns in input file */
char inbuf[80];
double x, sumx, sumx2, y, sumy, sumy2, sumxy, n, meanx, meany;
double stddevx, stddevy, corrxy, slope, intercept, predicty;
double a,b,c;
FILE *infile, *outfile;
x = sumx = sumx2 = y = sumy = sumy2 = sumxy = n = meanx = meany = 0.;
stddevx = stddevy = corrxy = slope = intercept = predicty = 0.;
a = b = c = 0.;
if(argc < 3)
{
printf("\nThe stats program can accept files with either one or two\n");
printf("columns of data.\n\n");
aborts("Invoke with: STATS INFILE OUTFILE");
}
if((infile = fopen(argv[1],"rt")) == NULL)
aborts("Input file not found.");
if((outfile = fopen(argv[2],"wt")) == NULL)
aborts("Unable to create output file.");
if(fprintf(outfile,"%3s %7s %7s %7s %7s %7s %7s %7s %7s %7s\n",
"n","x","y","sumy","meany","stddevy","corr","slope","intcpt","->y") == ERR)
aborts("File write error.");
if(fprintf(outfile,"%3s %7s %7s %7s %7s %7s %7s %7s %7s %7s\n",
"---","---","---","----","-----","-------","----","-----","------","--------") == ERR)
aborts("File write error.");
while(fgets(inbuf,80,infile))
{
columns = sscanf(inbuf,"%lf%*[^-+0-9]%lf\n",&x,&y);
if(columns == 1)
y = x;
n++;
sumx += x;
sumx2 += x * x;
sumy += y;
sumy2 += y * y;
sumxy += x * y;
meanx = sumx / n;
meany = sumy / n;
a = (n * sumxy) - (sumx * sumy);
b = sqrt((n * sumx2) - (sumx * sumx));
c = sqrt((n * sumy2) - (sumy * sumy));
stddevx = b/n; /* could be n-1 for sampled data */
stddevy = c/n; /* but it's more complicated than just changing these to n-1 */
if((b > 0.) && (c > 0.))
{
corrxy = a / (b * c);
slope = corrxy * stddevy / stddevx;
intercept = (-corrxy * stddevy * meanx / stddevx) + meany;
predicty = (slope * x) + intercept;
}
else
{
corrxy = 2.0; /* I think this is just a flag that the value is meaningless. */
slope = 0.;
intercept = 0.;
predicty = y;
}
if(fprintf(outfile,"%3.0lf %7.2lf %7.2lf %7.2lf %7.2lf %7.2lf %7.3lf %7.3lf %7.3lf %7.2lf\n",
n,x,y,sumy,meany,stddevy,corrxy,slope,intercept,predicty) == ERR)
aborts("File write error.");
}
fprintf(outfile,"\nstddevx = %lf meanx = %lf sumx = %lf\n",stddevx,meanx,sumx);
fclose(infile);
if(fclose(outfile) == EOF)
aborts("File write error.");
exit(0);
}
|
|
|
|
|
|