Github: model directory

detect_ticker

  • Purpose: Given a body of text, correctly identify the set of ticker(s) being discussed
  • Target data: Reddit (or similar social media data)
  • Logic: Refer to github code
  • Statistical Reasoning:
    1. Minimizing false positives is crucial as incorrectly stating tickers can significantly skew towards a common word (e.g. auxiliary verbs, pronouns)
    2. Given a larger sample size of posts, some tickers that are affected by false negatives are hopefully recognized correctly from other posts
    3. Understand the derivation of ticker attributes (from North American market), which do not contain digits and have a prior known max ticker length
# Python

# Import module
from stock_market.model._classification import detect_ticker

text = "AAPL will have a fantastic run this year!"
print(
   detect_ticker(text=text, source="reddit")
)