MPTrun | index /Users/katherinepaseman/Documents/projects/Analyst/MPTs/2016 MPT/MPTrun.py |
#===========================================================================================================
# Copyright (c) 2006-2015 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
Functions | ||
|
Data | ||
Rpipelines = [{'name': 'plR_m2', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(2); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 2 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m3', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(3); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 3 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m4', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(4); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 4 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m5', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(5); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 5 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m6', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(6); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 6 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m7', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(7); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 7 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m8', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(8); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 8 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m9', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(9); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, 9 members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}] i = 9 plRm = 'computeSecurityReturns(); computeRouwenhorstSecu...erCount(%d); computePortfolioReturns(); report();' shortTimeFrames = ['1 4 2015 12 15 2016'] |
Analyst.MPT.MPT | index /Users/katherinepaseman/Documents/projects/py/Analyst/MPT/MPT.py |
#===========================================================================================================
# Copyright (c) 2006-2015 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
SYNOPSIS
MPT (Modern Portfolio Theory) lets us write Financial analysis programs in the form of "pipelines".
Like unix "pipes" pipelines are straightline programs which process a datastream from quotes to reports.
See more documentation in the MPT class.
Process
Pick a source for historical prices. Default: Yahoo daily
o Yahoo - daily, weekly and monthly data, 20 year time horizon.
o AAII - monthly data only , 10 year window only. Monthly snaps going back 10 years allows "back testing"
and to introduce de-listed tickers, which Yahoo no longer records.
o .hmt files - implicit daily data, currently unable to calculate exact dates for the samples.
This inhibits some reporting
Create Criteria: Each criterion consists of 4 things:
- A Name
- A Security name set, consisting of a set of ticker symbols.
- A Benchmark market index
- A Riskless return index
E.g. Divide "ticker space" into 4 based on their exchange: American, NYSE, NASDAQ, and OTC
o American, Nasdaq and NYSE have fewer "Bad Tickers" than OTC. So we address "selection bias"
(http://en.wikipedia.org/wiki/Selection_bias) by partitioning AAII tickers by these exchanges.
o Note: We can filter (or partition) the AAII list by any AAII attribute. (../References/20040206 Column Table.xls)
Select a set of timeframes
o Note: the longer the timeframe, the more "bad tickers" are created
Run pipeline for each timeframe, for each criterion, for each ticker set in 3 steps:
o Init
- Timeframe: --No Default--: Timeframe introduces the greatest variability
- Period - (m)onth, (w)eek, (d)ay: Default: "d"
o Load
- Ticker set(s): Default: American, NYSE, NASDAQ, and OTC
- Benchmark index: Default: American- %5EDJI, NYSE- %5EDJI, NASDAQ - %5EIXIC, and OTC - %5ERUA
- Riskless Return index: Default: %5EH15_TB_M6
o Run
- "j" and "k" lags: Default: [65,130,195,260] days (= 3,6,9,12 months)
- Portfolio Count: Default: 10 - Note Prior to 20100516, we deleted items from the middle of the set.
Now we default to creating an "unused pool" and report a factorization of the securityName length
if the specified portfolioCount does not evenly divide the number of securityNames.
Generate Web page with Reports
o Running Time + Ticker loading Errors to see that we haven't introduced selection bias
o Verification that the test populations have log normal returns.
o Rouwenhorst Table I and Table II (including "p"s along with t-test)
- returnsTtest = stats.ttest_ind(winnerReturns, indexReturns)
- (http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#comparing-two-samples)
- winnerLoserTtest = stats.ttest_rel(winnerReturns, loserReturns)
- (http://www.astro.uu.nl/~werkhvn/docs/man/docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html)
TODO
o Make Multifactor.py an MPT instruction.
o Create Gold Test Set for Rowenhorst, Multifactor and CC.
o Debug “I'm sorry, Dave, Im afraid I can't do that.”
o src link in html results are off by one character. - listDictAnchorifyValues
o 20150120 - screening-for-stock-using-the-lakonishok-approach -
o Incorporate AAIIdbf
TBD20150320: period is used inconsistently. 'd' is needed for hmtload. 'm' is needed to make ratioToMonthlyReturn match the multifactor baseline
# Suggest we remove the 'd' check in load, and use period solely for ratioToMonthlyReturn input.
# CHeck if it conflates with J,K. i.e. are j,k speced in days, weeks or months?
# So we have three so far: sample period units in Hmt input file; train/test period (j/k) units and reporting period %/mo - %/yr
TBD20150917
pass ,_p=p,gapReport=True
from MPT.loadSecurityNames
to StockDb.loadSecurityNames
to loadFromStockhistoryDir into loadFromStockhistoryDir2 (MPTload)
MPT.loadSecurityNames(self, src, securityNames, indexSecurityName, risklessReturnSecurityName)
StockDb.loadSecurityNames(self, src, securityNames)
earlyDate,lateDate = getDates(self.timeFrame)
self.securityNames, self.quotes, self.dates =
loadFromStockhistoryDir(pathname,securityNames,earlyDate,lateDate,self.period,self)
loadFromStockhistoryDir2(pathname,securityNames,earlyDate,lateDate,period,self.p)
Future:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2533943
- How validate results?
Train. Then measure market α, β, q and p during test period.
Question 1: How well does the model eliminate β when p and q match?
Would you expect a plot of dividend return rates versus volatility to show an efficient frontier?
How "volatile" is the portfolio? i.e. do the portfolio members change drastically with one time step?
If so, why? If not, why not? Which "regions" produce the best results, thise with volatile portfolio memebrs or stable ones?
Suppose we find a parameter set that biases results into a realm the shows MPT type behavior (e.g. efficient market line).
Will MPT style theories work especially well there?
Couple Hellmut's Appspot regression analysis to MPT
p32 - Graphic 4 - drift with sicherheits puffer
Incorporate "Improved Estimates of Correlation Coefficients And Their Impact on the Optimum Portfolios"
o Add Sortino ratio; Sharpe Ratio
o Add Hellmut's mechanism to calculate optimal quantities
o Incorporate "bornDuringInterval/diesDuringInterval" tickers
o Fri, Feb 18, 2011 at 11:00 AM - select market segments. E.G. mid caps. Or small caps. Or industries.
- 1st day in month
- Most Traded
- Exchanges
o http://matplotlib.sourceforge.net/examples/axes_grid/scatter_hist.html
o pull over documentation from below "generateDocumentation" in 20100909MPT_.py
FIX QD
var(dof)
Hellmut
p44,53
use sample betas vs. index betas
Rouwenhorst with beta+ and beta- with Beta Ranking
page 67alpha stratgy for derivatives, p 25
cov = 10.A2 to calculate running Beta for above
See also Rp278 to show that buying losers is OK.
4 long; 4 short; sell after 2 mos
Per Security Data is kept in dictionaries indexed by securityNames:
quotes, securityReturns, securityRankings
Ratios
Treynor = Ra-Rf/βa
T = Treynor ratio,
Ri = portfolio i's return,
Rf = risk free rate
βi = portfolio i's beta
Sharpe = (Compound ROR – Rf) / (Standard Deviation of Returns)
Sortino = (Compound ROR – Rf) / (Standard Deviation of Negative Returns)
#===========================================================================================================
CHANGELOG
distinguish debug log, results log (tab file)
20110212
Test Speed
Speedup occurs because:
o The numpy array routines run primarily in "C".
In essence the python interpreter is invoked only 18 times for Rouwenhorst.
o We precompute security returns at all lags of interest and reuse them during the computation.
This also reduces space requirements.
o We only look at Rouwenhorst Winner/Loser, no more intermediate portfolios.
Slowdown occurs because:
o there is unnecessary calculation at the array level (e.g. the Xl array is zero above membercount,
but the numpy array multiplier does not "see" this and so does the multiplication anyway).
Note: this version of MPT seems to be about 10x faster than prior versions. Runtime is now dominated by tickerload.
Net,Net 20110215 takes about 20 minutes to run. 20100830 used to take more than 3 hours (10x speedup)
Using as a base case: Rouwenhorst membercount = 3 DJIA TimeFrame: 1 1 2008 1 1 2010 Period: m Component Count: 30
Compared http://www.paseman.com/projects/Analyst/Hellmut/20100920/20100920RouwenhorstDebug.txt
to output of 20100909MPT, 20110212MPTalt and 20110212MPT.
Compared Rouwenhorst membercount = 10 AMEX TimeFrame: 1 14 2005 5 14 2008 Period: d Component Count: 254
between 20100909(MPT) and 20110212(MPTalt)
Bug 1: ComputeRowenhorstXs used j instead of validFirstIndex, and so gave an index error when used in a CC pipeline
Bug 2: Thought ComputeRowenhorstXs set portfolio, changed name to setXsMemberCount and added setXsPortfolioCount
20110214
Compared output of 20100909MPT and 20110212MPT(goldTestMPT) for plCC_p10 j/k=3,3
Bug 1: in 20100909MPT: riskless return needs to be divided by 12*100, not 12 in
Bug 2: in 20100909MPT: 1/tau was hardwired to 250 - changed hardwire to 12 for testing
Bug 3: Winners/Losers bugs in report
20110217
Bug 1: Hit an interesting bug in Gruber's calculations
Some thinly traded stocks maintain the same price for an entire volatility "window" (e.g. CXM).
As such, their volatility and average return is zero for that timeframe.
so for finite positive Rf, Zi = f((Ri - Rf)/sigma) = - infinity
and Xi (= Zi/Sum(Zi)) is garbage.
This is what produces some of the NaN (Not a number) results in the 20110215 run.
I have altered the code to discard these securities at run time.
To simplify my life, I discard the the security for all j's
Bug 2: Another source of NaN results is that for small enough values, variance becomes negative (e.g. -1E-17)
and taking the square root produces NaN.
RunningVariance now takes the absolute value of the result
It produces no change in Rouwenhorst
Note 3: We can still generate NaN results. Eg. if time frame is a year and volatilty window is also a year,
No results are produced and nothng/nothing = NaN
20110218
Bug 1: Russel 200 has time gaps, patched manually in index file
Feature 1: Plot R vs. sigma for CC and R. - page 8
20110227-20110301
o Report what stocks to buy today for each strategy
o Loop over timeframes to determine timeframe sensitivity
o Run tests to see what long/short "numberOfSecuritiesPerPortfolio" works the best - From Sat, May 8, 2010 at 12:34 AM Why do you need 10 deciles? The experience shows that not the deciles but e.g. 4 long 4 short for Dax was optimal. It may be different for other indices. case I understood R. well, then he used these deciles only for looking if there are any kind of peaks in one ranked decile. Literature uses 3rd s also instead of 10 th. The most interesting case is the selection of an optimal number of stocks for going long and optimal number of stocks for going short. If this is true, then you can also use 5 long and 4 short if it is useful e.g.
o Run Hellmut's favorite lag -
201412
o Hooked in MPTalt procedures as MPT methods.
QD20150320: Discovered the hmtload kept quotes in reverse order
Modules | ||||||
|
Classes | ||||||||||
|
Functions | ||
|
Data | ||
Mpl = "computeSecurityReturns(); computeSecurityVolatil...ePortfolioReturns(); dump(); report('WinnerOnly')" exp = <ufunc 'exp'> inf = inf log = <ufunc 'log'> loserPortfolioIdx = -1 mgrid = <numpy.lib.index_tricks.nd_grid object> mpt = <Analyst.MPT.MPT.MPT instance> pipelines = [{'name': 'plR_p3', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...lioCount(3); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, Three portfolios, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_p10', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...ioCount(10); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, Ten portfolios, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m3', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(3); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, Three members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plR_m10', 'stages': 'computeSecurityReturns(); computeRouwenhorstSecu...erCount(10); computePortfolioReturns(); report();', 'text description': "Rouwenhorst ranking, Ten members, 'Winner/Loser' reporting - Rouwenhorst JOF 199802"}, {'name': 'plCC_p10', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ioCount(10); computePortfolioReturns(); report();', 'text description': "Constant Correlation ranking, Ten portfolios, 'Winner/Loser' reporting - Elton/Gruber 6thEd p195"}, {'name': 'plCC_XlsW0R40', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh... reporting - Elton/Gruber 6thEd p195 j volatility'}, {'name': 'plCC_XlsW0R50', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh... reporting - Elton/Gruber 6thEd p195 j volatility'}, {'name': 'plCC_XlsW0R60', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh... reporting - Elton/Gruber 6thEd p195 j volatility'}, {'name': 'plCC_XlsW260R40', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh...orting - Elton/Gruber 6thEd p195 12 mo volatility'}, {'name': 'plCC_XlsW260R50', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh...orting - Elton/Gruber 6thEd p195 12 mo volatility'}, {'name': 'plCC_XlsW260R60', 'stages': 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();', 'text description': 'Constant Correlation ranking, Two portfolios, Rh...orting - Elton/Gruber 6thEd p195 12 mo volatility'}, {'name': 'plCC_XlW0R40', 'stages': "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');", 'text description': "Constant Correlation ranking, C* portfolio forma...inner Only' reporting - Elton/Gruber 6thEd p195-6"}, {'name': 'plCC_XlW0R50', 'stages': "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');", 'text description': "Constant Correlation ranking, C* portfolio forma...inner Only' reporting - Elton/Gruber 6thEd p195-6"}, {'name': 'plCC_XlW0R60', 'stages': "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');", 'text description': "Constant Correlation ranking, C* portfolio forma...inner Only' reporting - Elton/Gruber 6thEd p195-6"}] plCC = 'computeSecurityReturns(); computeSecurityVolatil...teCCX%s(); computePortfolioReturns(); report(%s);' plCC_XlW0R40 = "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');" plCC_XlW0R50 = "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');" plCC_XlW0R60 = "computeSecurityReturns(); computeSecurityVolatil... computePortfolioReturns(); report('WinnerOnly');" plCC_XlsW0R40 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_XlsW0R50 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_XlsW0R60 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_XlsW260R40 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_XlsW260R50 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_XlsW260R60 = 'computeSecurityReturns(); computeSecurityVolatil...ongShorts(); computePortfolioReturns(); report();' plCC_p10 = 'computeSecurityReturns(); computeSecurityVolatil...ioCount(10); computePortfolioReturns(); report();' plR = 'computeSecurityReturns(); computeRouwenhorstSecu...ioCount(%d); computePortfolioReturns(); report();' plRR_m10 = 'Rouwenhorst(10); report();' plRR_m3 = 'Rouwenhorst(3); report();' plRRm = 'Rouwenhorst(%d); report();' plR_m10 = 'computeSecurityReturns(); computeRouwenhorstSecu...erCount(10); computePortfolioReturns(); report();' plR_m3 = 'computeSecurityReturns(); computeRouwenhorstSecu...berCount(3); computePortfolioReturns(); report();' plR_p10 = 'computeSecurityReturns(); computeRouwenhorstSecu...ioCount(10); computePortfolioReturns(); report();' plR_p3 = 'computeSecurityReturns(); computeRouwenhorstSecu...lioCount(3); computePortfolioReturns(); report();' plRm = 'computeSecurityReturns(); computeRouwenhorstSecu...erCount(%d); computePortfolioReturns(); report();' template = '<html>\n<head>\n<title>%s %s</title>\n\n<script type...1>%s %s</h1><br><h2>%s</h2><br>%s</body>\n</html>\n' winnerPortfolioIdx = 0 |
Analyst.MPT.MPTalt | index /Users/katherinepaseman/Documents/projects/py/Analyst/MPT/MPTalt.py |
#===========================================================================================================
# Copyright (c) 2006-2014 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
This file contains a straightforward implementation of Rouwenhorst, Multifactor and Gruber for testing and exposition.
The production version of these algorithms is in MPT, which slices up and shares pieces of the algorithms.
Both try to program in an APL style,
i.e. few loops, just straight line code using composition and reduction operators on arrays.
As in machining or baking cookies, we are given a "sheet" of numbers and we just punch out the right pattern....
Functions | ||
|
Data | ||
inf = inf loserPortfolioIdx = -1 mgrid = <numpy.lib.index_tricks.nd_grid object> sqrt = <ufunc 'sqrt'> winnerPortfolioIdx = 0 |
Analyst.MPT.Multifactor | index /Users/katherinepaseman/Documents/projects/py/Analyst/MPT/Multifactor.py |
#===========================================================================================================
# Copyright (c) 2006-2015 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
# HTC: is "Hellmut Test Case" for DAX_20141014.hmt in
projects/Analyst/Analyst\ Hellmut/20141115\ Hellmut\ -\ Is\ Systematic\ Risk\ diversifiable/Multifactor.py
Modules | ||||||
|
Functions | ||
|
Data | ||
loserPortfolioIdx = -1 winnerPortfolioIdx = 0 |
Analyst.MPTutil | index /Users/katherinepaseman/Documents/projects/py/Analyst/MPTutil.py |
#===========================================================================================================
# Copyright (c) 2006-2015 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
Used by MPTalt and MPT
Modules | ||||||
|
Functions | ||
|
Data | ||
abs = <ufunc 'absolute'> log = <ufunc 'log'> loserPortfolioIdx = -1 maxBias = 21.0 minimum = <ufunc 'minimum'> oneOverTauTable = {'d': 252.0, 'm': 12.0, 'w': 52.0, 'y': 1.0} sqrt = <ufunc 'sqrt'> winnerPortfolioIdx = 0 |
Analyst.MPTload | index /Users/katherinepaseman/Documents/projects/py/Analyst/MPTload.py |
#===========================================================================================================
# Copyright (c) 2006-2015 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
histories have gaps.
We addressed this in the last version of this code by using the index (first history loaded) as a master.
This approach fails when the index has gaps.
We address this (generally) here by constructing a master timeline (similar to pandas) from all histories.
In Particular:
1) Copy all ticker histories at once from the internet.
2) Index each quote in the history by 'date'.
3) Add all the history's 'date' indicies to a 'set'.
4) Sort the 'set' to construct a complete 'date' list.
5) Reconstruct each history. Pull quotes in 'date' list order from each indexed history.
6) If the quote does not exist in the index, use the entry from the prior day and report the missing 'date'.
Modules | ||||||
|
Functions | ||
|
Data | ||
DAXcomponents = ['ADS.DE', 'ALV.DE', 'BAS.DE', 'BAYN.DE', 'BEI.DE', 'BMW.DE', 'CBK.DE', 'CON.DE', 'DAI.DE', 'DB1.DE', 'DBK.DE', 'DPW.DE', 'DTE.DE', 'EOAN.DE', 'FME.DE', 'FRE.DE', 'HEI.DE', 'HEN3.DE', 'IFX.DE', 'LHA.DE', ...] |
Analyst.StockDb | index /Users/katherinepaseman/Documents/projects/py/Analyst/StockDb.py |
#===========================================================================================================
# Copyright (c) 2006-2011 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
# StockDb returns an array of quotes for each of a specified list of securityNames for a timeFrame.
#
# **Dirty Data**
# AAII <> Yahoo
# AAII and Yahoo give different dieDuringInterval values for the same interval!!!
# This is to be expected, since each increases prior prices by the amounts of intervening dividends.
# So prior AAII dbs of quotes are totally out of sync with Yahoo's current price set.
# TBD: get Yahoo -monthly- snap at same time AAII does and compare AAII to Yahoo values.
#
# **Selection Bias**
# This module can introduce selection bias via "Incomplete Histories" and "Suspicious Quotes".
#
# >>>Incomplete Histories
# Of the following cases, we are most concerned about the first one:
# - Quote histories start before beginning of timeframe and end before end of timeframe
# - Quote histories start before beginning of timeframe and end after end of timeframe
# - Quote histories start after beginning of timeframe
# - Quote histories start after end of timeframe
# - Quote histories end before beginning of timeframe
# StockDb reports tickers with incomplete histories over the requested intervals.
# Those which come into existence during the interval (bornDuringInterval) are OK.
# Those which end during the interval (diedDuringInterval) are discarded by the code, so there is a potential to undercount losses.
#
# >>>Suspicious Quotes
# Quotes which reach zero over their lifetime, or drop/rise by a large factor in one period (day).
# moving return factor from 5x drop to 10x drop allowed all NYSE stocks to pass.
# American stocks: IVD has spike in 2008.
# WAC went from 0.13 to 5.30 with zero shares traded.
#
# Limiting by exchanges is the best way to address both incomplete histories and Suspicious Quotes.
# Below is a run for the four AAII exchange classification: American, NASDAQ, OTC and NYSE for downloaded histories:
# OTC has the most anomolies and Nasdaq, NYSE and American the least
#
# o Step 1 - alert to the condition
# o Step 2 - narrow selection criteria so that we have few companies without a complete history
# - Will not account for M&A, e.g. Acquisition of Shering-Plough
# o Step 3 - keep selection criteria wide, but account for new stocks coming in, or old stocks going out
# - Lot of Coding!
# TBD: (Somehow) incorporate prior implementation which kept quotes as a dictionary, keyed by time
Classes | ||||||||||
|
Functions | ||
|
Analyst.history | index /Users/katherinepaseman/Documents/projects/py/Analyst/history.py |
#===========================================================================================================
# Copyright (c) 2006-2007 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
#===========================================================================================================
# history
# - downloads ticker price histories from the internet to a file
# - loads file into a dictionary
# - Note that the only connection between histories and AAIIdb is the ticker symbol
# File Format is: "Date" "Open" "High" "Low" "Close" "Volume" "Adj Close"
# Usage:
# downloadHistories(["SAP","T"]) # puts history file for "SAP" and "T" tickers in local folder
# myDictionary = loadHistory("SAP") # converts molst recent downloaded "SAP" file to dictionary
# p=myDictionary["2007-02-15"]["Adj Close"] # Return Adjusted close for SAP on February 15, 2007
#===========================================================================================================
# DateStampedPath File Organization
# Description
# The "history" directory contains "byDate" directories.
# "byDate" directories contain "byPeriod" directories.
# "byPeriod" directories contain histories of all ticker symbols listed in the current AAII db
# A "byDate" directory name is of the format YYMMDD and records ticker histories for different periods
# A "byPeriod" directory name is one of "m" month "w" week and "d" day and records ticker histories for years prior to YYMMDD
# - e.g. 20070303 contains the histories of all tickers up to and including March 3, 2007.
#
# A "byTicker" directory contains one "tickerHistory" file for each ticker Symbol.
# Each "tickerHistory" file has the name of the ticker symbol whose history it contains.
# A "byTicker" directory takes about 304 megabytes on disk and takes about 2 hours to create over broadband.
#
# Example
# stockhistory <- "stockhistory" directory. There is one "stockhistory" directory
# -- 20070303 <- byDate directory There are multiple "byDate" directories
# ---- d <- byPeriod directory There are at most 3 ("m", "w", "d") "byPeriod" directories
# ------ A.csv <- tickerHistory file for ticker "A" on 20070303 with a daily period
# ------ AA.csv <- tickerHistory file for ticker "AA" on 20070303 with a daily period
# ------ ...
# ---- 20070304.csv <- dailyQuote file
#
#
# (antiquated): A "history" directory also contains "dailyQuote" files.
# Each "dailyQuote" file contains the statistics for every AAII db ticker for a particular day.
#===========================================================================================================
# downloadHistory - copy URL file to local file
# - takes a ticker symbol, gathers daily data from last -n- years to current day, stores in file
# Example:
# print downloadHistory("SAP", 4,"chart.yahoo.com","history/")
# returns the following header
# Date Open High Low Close Volume Adj Close
#
# print downloadHistory("ADS.DE",4,"chart.yahoo.com","history/")
# for DAX tickers. This is different from a few years ago.
# Should never be used directly by end user
#===========================================================================================================
Modules | ||||||
|
Functions | ||
|
Data | ||
historyPath = './stockhistory/' |
Analyst.AAII | index /Users/katherinepaseman/Documents/projects/py/Analyst/AAII.py |
# -*- coding: utf-8 -*-
#===========================================================================================================
# Copyright (c) 2006-2010 Paseman & Associates (www.paseman.com). All rights reserved.
#===========================================================================================================
# o AAII (American Association of Individual Investors) distributes a monthly CD containing
# corporate information, keyed by ticker symbol, with (as of 2006) 8,706 rows and 2,815 columns. (2009 - 9877 rows)
# o The AAII class holds a -read only- dictionary of a -stringified version- of this data
# - Tickers (which we refer to as securityNames) key 2,815 columns across the 14 AAII "segment" files
# (F01.txt,F01_Key.txt) thru (F14.txt,F14_Key.txt).
# The ticker (key) is in the first column in each file. Column names are unique across all segment files.
# Each value is a string and -cannot- be coerced. (see "buggy" coerceColumn below), use copyColumn instead
# - AAIIdb.idx provides a -read only- interface to this data keyed by 'securityName' and 'columnName'.
# Behind the scenes, it demand loads and "joins" AAII segments.
#
#===========================================================================================================
# Usage:
# db = AAIIdb(path) # initialize. load all path/F??_Key.TXT files
# db.idx("AA","MKTCAP") # For the AA securityName, get the company market cap. As a side effect, the first call
# # demand loads the segment containing the MKTCAP column for all 8,706 aaii rows.
# # In all, there are 2,815 aaii columns spread across 14 segments
# db.copyColumn("MKTCAP",float) # Create a dictionary, (key, value) = (securityName, indicated column)
# # The indicated column is coerced
#===========================================================================================================
# self.columns[columnName] = segmentId,segmentFileName,offset
# self.rowIds[securityName] = rowId
#===========================================================================================================
# Useful columns: (See 20040206 Column Table.xls for a complete list as of 2004)
# MKTCAP - Market capitalization in millions
# EXCHG_DESC - Exchange Description - "A - American", "M - Nasdaq", "N - New York", "O - Over the counter"
# COMPANY - Company name
#
# MULTIPLES
# PE Price/Earnings per Share
# PBVPS Price/Book Value per Share
# PSPS Price/Sales per Share
# PCFPS Price/Cash Flow per Share
# PFCPS Price/Free Cash Flow per Share
#
# PROFITABILITY
# Gross Profit Margin
# Operating Margin
# Net Profit Margin
# Return on Assets
# Return on Equity
#
# LIQUIDITY
# Quick Ratio
#
# PRICE
# PRICE_M001 - PRICE_M120 - The closing price of the last trading day for each of the last 120 months.
# PRICEDM001 - The date of the last trading day for each of the last 120 months.
# PRICEHM001 - The highest price the stock has traded at in each of the last 120 months.
# PRICELM001 - The lowest price the stock has traded at in each of the last 120 months.
# PRICEVM001 - The total trading volume for a company’s stock for each of the last 120 months.
#===========================================================================================================
Modules | ||||||
|
Classes | ||||||||||
|
Functions | ||
|