Page 74 - Cyber Defense eMagazine - November 2017
P. 74

●       Caveat  #1:  Models  need  to  be  trained  on  the  right  data:  To  accurately  differentiate
               between  malware  and  "goodware,"  a  model's  datasets  need  to  consist of  a  diverse  range  of
               both. Otherwise, imbalances in sample types can produce biases. Ex: Models can be prone to
               false positives, classifying legitimate programs as malware and creating the need for overrides.
               This is especially true when organizations deploy their own or custom-built third party software.

               ●       Caveat #2: Models need to be constantly refreshed: Attacks evolve. New techniques and
               new  malware  appear  constantly.  As  a  result,  as  time  passes,  machine  learning  models
               designed  to  detect  malware  gradually  deteriorate.  Their  accuracy  suffers,  with  new  malware
               samples  slipping  past  them  and  updates  to  legitimate  software  triggering  false  positives.  To
               compensate,  some  vendors  use  whitelisting  and  blacklisting,  which  increases  management
               costs and doesn’t solve the underlying problem. It’s not until a model can be retrained on new
               samples that accuracy can be restored. And then the cycle begins anew.

               With  these  caveats  in  mind,  it’s  worth  noting  the  adoption  of  machine  learning  for  security
               purposes  is  still  in  its  early  stages.  As  analysts  point  out,  many  models  lack  refinement  and
               currently serve as “coarse-grained filters” that operate with a clear over-sensitivity to malware
               versus goodware. That’s because the vendors behind them have often found themselves facing
               a difficult choice between providing wider coverage (blocking more malware) or more accurate
               coverage  (making  sure  malware  is  the  only  thing  getting  blocked).  In  those  cases,  wider
               coverage wins nearly every time. As a result, false positives have become the accepted price of
               protection, even though they are well understood to be a prohibitive barrier to effective roll-out
               and come at considerable cost.

































               As  more  security  vendors  turn  their  attention  to  successfully  harnessing  machine  learning,
               however,  significant  advances  are  being  made  that  may  eventually  make  that  “necessary”
               sacrifice a thing of the past.

                   74    Cyber Defense eMagazine – November 2017 Edition
                         Copyright © 2017, Cyber Defense Magazine,  All rights reserved worldwide.
   69   70   71   72   73   74   75   76   77   78   79