General Discussion
In reply to the discussion: Revealed: the top secret rules that allow NSA to use US data without a warrant [View all]DallasNE
(7,404 posts)I think we both have a little clearer picture. I know I made a statement that was too definitive since it relied on assumptions too much. Data mining, by the way is pretty complicated and takes a lot of computing power and that is why I am more dismissive than you appear to be. The only practical time to mine the data is in the same process that builds the database, which could be real-time, because doing it after the fact against 6 months data is just impractical. Data mining involves a process called sound-ex that assigns numeric values to words and then goes plus-minus against the numeric value of words being targeted. Keep in mind how many different languages and dialects there are with the PRISM data (and that is all we are talking about here). If these are targeted words you would want to capture both "terror" and "terrer" but not "terrier", "color" and "colour" but not "collar". In other words simple misspellings and dialects but not similar words. Get more sophisticated and you may need to view the context of the word since terrorist like to use code to avoid detection. "Recipe" not in a food context would want to be captured while not capturing it in a food context. (You also would want to pick up someone saying "that would be a recipe for disaster". You may even want different criteria for Google than for Twitter. Like I say, complicated. All of this takes massive computing power against the huge volume of data being captured. Yes, I am a retired IT guy with some experience with sound-ex. One application using this was directory assistance where a caller wants the number for Dick Petersen on Greystone Road and the operator keys in Dick Peterson on Graystone Road and up pops both Richard Petersen on Greystone Circle and Dick Peters on Greystone Circle -- at least they should pop up depending on the tightness of the sound-ex match.