Github: model directory
detect_ticker
- Purpose: Given a body of text, correctly identify the set of ticker(s) being discussed
- Target data: Reddit (or similar social media data)
- Logic: Refer to github code
- Statistical Reasoning:
- Minimizing false positives is crucial as incorrectly stating tickers can significantly skew towards a common word (e.g. auxiliary verbs, pronouns)
- Given a larger sample size of posts, some tickers that are affected by false negatives are hopefully recognized correctly from other posts
- Understand the derivation of ticker attributes (from North American market), which do not contain digits and have a prior known max ticker length
# Python
# Import module
from stock_market.model._classification import detect_ticker
text = "AAPL will have a fantastic run this year!"
print(
detect_ticker(text=text, source="reddit")
)