Aug 04

Changing passwords frequently reduces security

I’ve been saying this for a few years now. There is no reasonable threat for which this is a good counter-measure. And now the Chief Technologist at the FTC agrees with me:

 

http://arstechnica.com/security/2016/08/frequent-password-changes-are-the-enemy-of-security-ftc-technologist-says/

Jun 16

Penguicon 2016 Report

As in previous posts on this topic, I want to emphasize that this is not a review of the entire range of Penguicon, but simply a report on what I did and the panels and presentations I attended. Since I was responsible for the Tech Track, you shouldn’t be surprised to find that most of what I did was tech-related, but with the variety of things on offer at Penguicon I managed to get into a few other things as well, including a panel I joined.

Day 1: Friday April 29, 2016

I work not too far from the Westin Hotel, so after leaving work and catching dinner, I got to the hotel and went through registration to pick up my badge and my participant materials. I was just in time for the Opening Ceremonies at 6pm. When you have been working with a group of people all year to put on an event like this, it feels great to see it all come together. All of the Guests of Honor (GOH) were introduced, and this year my friend Deb Nicholson was one of them.

Then I attended a keynote presentation by George Gage demonstrating the electrical activity of nerves using large cockroaches as experimental subjects. He removed a leg, and attached electrodes to it. Then he stimulated the leg and we could view the electrical activity of the nerves. BTW, the cockroaches can grow back a leg when they molt, so this was not quite as cruel as it may seem. From there I went to the new space we had this year in the Executive Meeting Center adjoining the hotel for a presentation on pandemics and and deliberately making diseases extinct. We focused on polio and Guinea Worm (a parasite). And I note that at last report there were only two cases of Guinea Worm in the entire world, so we are close to eliminating this pest, thanks to the Carter Foundation.

Then Pat Baker did a great presentation on the Dark Web Big Three: TOR, I2P, and freenet. I am always looking to schedule some good talks on ways people can protect themselves online, and Pat’s talk was definitely a good one. After this I went searching for the Ubuntu Release Party, but it was nowhere to be found. On Saturday I caught up with Craig Maloney who explained that the party was earlier in the evening, but the program booklet had the wrong time in it. So shortly after 10pm, and the end of a full week at work, I headed for home.

Day 2: Saturday April 30, 2016

I headed back to the hotel in the morning and took advantage of the breakfast buffet to fortify myself for a full day of con activities. To start things off, Jer Lance and Dawn Kuczwara discussed running a technical team and the difference between a manager and a leader. As a Project Manager that is something I have to do a certain amount of, and I really liked their discussion. Then it was off to the Fedora Roadmap. Tom Callaway of the Fedora project has become a regular at Penguicon (along with Ruth Suehle) and I had specifically requested he do this to expand on our Linux offerings. After all, we have Penguin in our name, and we should have a solid group of Linux presentations. Because we have at least 4 Canonical employees in Michigan (that I know of) Ubuntu has always been well covered, but I want to expand that. And I already have Tom working on a Red Hat challenge for next year. We will have some computers set up with deliberate errors in their configuration and see who can diagnose and fix them the fastest. It should be fun. We initially wanted to have this competition this year, but we got the idea too late to pull it together. BTW, Tom and Ruth also did their Raspberry Pi Hacks talk again this year.

Then I went to The Reality and Fiction of Artificial Intelligence, with Ann Leckie, Jason Mars, Lingjia Tang, and Jennifer Marsman. I know this has been in the news a lot, and frequently the focus is on whether it is a danger. these folks didn’t think we were anywhere near a “robot uprising”. Ann is the author of the Hugo Award-winning novel Ancillary Justice, Jason and Lingjia are computer scientists at the University of Michigan who are working in AI, and Jennifer Marsman is a Microsoft evangelist for the Azure Cloud who also did graduate work in AI at the University of Michigan, and helped us in finding the right people to contact there.

I then went to the Penguicon Board meeting. I am not a Board member, but the meeting was open and I wanted to see what was going on. The Board is the long-term continuing management over a number of years, and each year they pick a Con Chair to put the event together and give that person a budget. The Con Chair puts team together, and I am a part of that team. This year the Con Chair was a Scott Kennedy (great job!) and for 2017 it is Cylithria (Lithie) DuBois, who I am sure will also do a great job.

From here I went to a presentation on The Works of Miyazaki. Hayao Miyazaki is the legendary Japanese anime artist and my wife and I are big fans. This presentation focused on his earlier works, and was very nice. After this, Jennifer Marsman did a wonderful presentation Fun With Mind Reading. This combined EEG recording with Machine Learning in Azure to essentially do a kind of lie detection. The idea was to ask a series of questions to which the subject would give a truthful answer, and then the same questions, but this time lying. The Azure Machine Learning would learn the mental pattern of a truthful answer, and the pattern of a lie, and then for future questions it will offer a pretty accurate estimate of an answer to a new question.

At 4pm I went to How Will Technology Change Society? This panel had Deb Nicholson, our GOH from Open Invention Network, Jason Mars and Lingjia Tang from the University of Michigan, Tobias Buckell, a science fiction author, and Edward Platt, one of our Tech Track presenters. This panel got into a number of topics, such as personal freedom in an age of surveillance, and wound up in the area of automation taking away jobs. This was a natural lead-in to our panel the next day called Post-Capitalism so I made sure to invite everyone to come to that.

After this it was time for a break to grab something to eat and to peruse the dealer tables. Everyone who is registered gets access to the Con Suite, where there is food and drink available. That is part of what your registration money buys, and it includes beer for those who want something more adult. And the dealer tables deserve a look since you never know what you are going to find there. Books, jewelry and costumes are always there, but then there are the interestingly different items, which this year included a variety of soaps.

Then it was time for Krunal Desai to do a presentation called The Tech Behind Asteroid Mining. Krunal ran the tech track back when I was a presenter, then went to work as a senior avionics engineer for Planetary Resources, so his presentation was definitely from the front lines. We also paired him with Bob Trembley the astronomy guy on the Science track for a joint panel on asteroid science. And after that I went to Webcomics 101: Logistics, with Erika Wagner and Laura Cascos. I enjoy Webcomics, and I found it interesting to look at the issues involved in producing a comic. They are the people behind Sidekick Girl, which I recommend. I had considered following this with Night Sky Observing with Bob Trembley, but the weather was not accommodating so I decided it was time to call it a day.

Day 3: May 1, 2016

After another Breakfast buffet at the hotel, I started my last day of Penguicon with Michael Rometty, who did a A Look At LibreOffice Base. Michael has a YouTube channel under the name The Frugal Computer Guy, which has his videos from several series that I recommend highly. He is focused on Linux and LibreOffice and does a great job of introducing these things to new user. I was watching his videos on YouTube for a while and then last year discovered that he lives in this area, so of course I wanted a talk from him, and will ask him back next year.

I then went to the planning meeting for next year. We have a number of new faces on the committee as people drop off and get replaced by others, and it does take a lot of work to put together, so while this year’s committee is winding down, next year’s is gearing up.

Following this, Susan Sons did Security Principles for System Administrators. Susan is one of those gems that comes here year after year and gives great presentations. She is a security professional, and her talks are all security-related, and always worth attending. she is one of those people I always make a point to contact when I am planning the Tech Track.

Then came the panel I was on, Post-Capitalism. Matt Arnold and Ed Platt joined me for this, and we looked at how economies evolve and what may come next. A particular focus was on the job market since more and more stuff is getting automated.

After a break for lunch, I attended Ed Platt’s presentation on Free/Open Democracy. He looked at some of the tools available, such as Loomio, Liquid Feedback, and Intertwinkles. These tools help you to create an environment where decisions can be consensus-based and democratic.

And finally, it was time for the Closing Ceremonies. Usually everyone is fairly tired by this point, but there is also a kind of manic energy. Prizes are awarded for room parties and for costumes, volunteers are thanked, and GOHs say a few final words about their experience. Penguicon 2016 was certainly a great event. We had great guests and a record attendance that went over 1600 (we don’t know how much over since Registration stopped mid-day on Sunday and some people were still coming in. We had for the first time a computer lab at Penguicon, and the computers have been stored away for use next year. We could probably make better use of them, particularly in the Tech Track, and I will try to follow up on that for next year. But everyone I talked to had a great time and will be here next year.

May 10

Penguicon 2015 Report

Penguicon 2015 was a great success, and far more happened than I could have been part of. We had about 500 hours of programming, and even my track, the Tech Track, had 100 hours of content. So this report is my own diary of my particular experience of Penguicon 2015. Each person would have their own experience based on the events, panels, and talks they chose to attend. But with this much programming there was plenty for everyone to enjoy. And since most of the team that put this together is coming back for another year, and we have managed to add some more people, I think next year can be even better.

On Friday 4/24/15 I left my office in Dearborn, Michigan, grabbed dinner, then checked in at the event registration desk. I arrived in time for the official opening ceremony at 6pm. Of course, there were talks and events scheduled even earlier on Friday (with 500 hours of stuff to fit in, we used every possible slot), but I was happy to get going then. These opening ceremonies are about introducing the Guests of Honor, advertising events of the upcoming weekend, and so on. And right after the opening we had the first of our two keynotes, Aral Balkan. He is the designer and co-founder of ind.ie. He is very passionate about security and privacy, and his keynote focused on the dangers posed by companies like Google and Facebook who know everything about us. He is trying to create alternatives that really protect our privacy.

Aral was followed by our second keynoter, Bruce Schneier, the Security expert, who gave a talk that fit very well with Aral’s talk. Bruce just published a book called Data and Goliath which explores the problems of mammoth data collection which is happening every moment. Bruce referred to data as the pollution problem of the 21st century. This becomes an interesting problem  because as he noted (similarly to Aral) the biggest source of data collection is private companies, and stopping them might require government action. And the problem with that is that governments are generally happy to have companies do their data collection for them. Bruce thinks this can be resolved with the right legislation, but noted futurist and science fiction author David Brin disagrees, and says that the only thing that can work is radical openness where ordinary people can look at the government just as much as the government looks at us. I lean more to Dr. Brin’s view myself, but no matter which side you come down on this is a big deal for all of us. As part of having Bruce there we arranged to have copies of his book, which he kindly signed for anyone who wanted one. When our Con Chair said he had arranged to have 25 books on hand, I immediately told him to double the order, which was a good thing. We would have angered a bunch of people otherwise. We sold all but 6 copies.

While the book signing was going I was introduced to Fifty OneFifty, who came out from Kansas to see what Penguicon was all about. I was so focused on getting my book and getting it signed that I didn’t spend a lot of time with Fifty right then, but I had more opportunities over the weekend. He did try to get an interview with Bruce for Hacker Public Radio, but unfortunately Bruce could only be at Penguicon for Friday evening so there really wasn’t time.

After that I went to a panel on Welcome to Night Vale, a semi-monthly podcast that I love. I recommend it to anyone who is interested in an offbeat podcast about a fictional town that has been described as Stephen King meets Lake Wobegone. But by about half-way through I could tell I was running out of gas, so it was time to go home.

On Saturday morning I moderated a panel on Getting Involved in the Open Source Community. On the panel I had Ruth Suehle and Tom Callaway from Red Hat, Emily Gonyer from the Gnome Project, and William A. Rowe from the Apache project. It was fun to have all of these people sharing their experiences, and in particular to point out that most open source projects need a lot more help than just coders. So if anyone wants to contribute there are plenty of ways. For my part, I have done things like review documentation for the LibreOffice project, which really means taking a chapter and going through it with the software open in front of me and just verifying that each instruction works the way they say it does, and that the instructions make sense. That is something anyone can do, and lots of projects need people to do things as simple as that.

After this I had a nice hallway talk with Susan Sons, who does our Cryptoparty each year. She has lots of ideas of things we can do to improve, so I enjoy talking to her. One thing we discussed that I definitely want to bring in next year: She and Eric Raymond are looking at some of the base “plumbing” software that we all depend on but which is maintained by one or two aging developers (kind of like what happened to OpenSSL earlier), and they are working to develop good support models.

Then I went to a Maria DB talk by Colin Charles, a developer on the Maria DB team who flew in from Malaysia to tell us about what MariaDB is doing now. This project forked off from MySQL as a result of Oracle taking ownership and messing things up, and is now the default choice on most Linux distros. And by a happy coincidence, just about the time I heard about Colin joining us I read a post from my friend Jorge Castro of Canonical who talked about how MariaDB was now integrated into their JuJu cloud solution, so I signed up Jorge right away, and turned this session into a 2 hour presentation incorporating both MariaDB and JuJu. So a lot of awesome Cloud goodness here. But that was not all. I followed this with a talk from Jennifer Marsman from Microsoft. It was taking a little bit of a chance, but she knew this was an open source convention and therefore presented Azure and emphasized all of the open source software that was ready to run on that platform. Linux distros, Hadoop, Apache Ant, Drupal, and so on. So her talk was very well received and I plan to invite her again next year.

After this was my second panel, this time on Creative Destruction. Mark Haynes put this together, and when he asked me if I knew any economists I decided I should be one of the participants. I looked at the origin of the term as used by Joseph Schumpeter and some of the implications of it. There were others looking at ecology, biology, and other sciences on this panel as well. So I actually participated on something in the Science Track, which was a first for me. But after that I left the con get home since my wife and I had tickets for the Symphony, and I try not to let anything keep me away from that, particularly when Mahler is on the program. (Though to be fair, they do an excellent job on most things.)

On Sunday I got there on time for the recording of the Sunday Morning Linux Review. They always do a live recording at Penguicon, and this year Fifty OneFifty was a guest on the show. This went well as always, and I won bragging rights for the trivia quiz, which took the place of Mary’s usual Is It Alive feature. Then I went to the What’s New in KDE 5 talk, but unfortunately Ryan had to take his wife to hospital so we had some discussion among ourselves in the room for a while. I did check later and Emily seems to be doing fine. Then I decided to get some breakfast at the buffet in the hotel.

Next up was Mary Tomich’s talk Swimming with Dolphin, the KDE File Manager. I had been looking forward to this talk since both Mary and I are KDE users. And she did not disappoint, it was a great talk, and I learned a lot. As a result I made a commitment to try using Dolphin (up to now I was strictly using Krusader) and if I can do everything I need to do there I may switch.

Then I went to one of the Science Fiction panels called Science Fiction Is Now Science Reality. This panel had Karl Schroeder ( a previous GOH), Annalee Newitz of Gawker (2015 GOH), and Charlie Jane Anders of io9 (2015 GOH). We had a very interesting discussion of where science and technology are taking us, and even managed to bring in some optimism. Like many of us, I am no longer interested in reading about dystopias, and sometimes it seems like all current SF is nothing else. I read SF as a kid and it was all about how awesome the future would be, and that is what I want to read now. Karl and I had a very nice breakfast together on a Sunday morning at Penguicon a few years ago when was GOH, and I was happy to renew this slight acquaintance.

Then it was on to Firewalls with pfSense by Tom Lawrence. I just met Tom earlier this year and I am glad I was able to sign him up a presenter. His talk was really good, and I heard a lot of compliments from other attendees. When Tom’s talk was over, Tony Bemus grabbed me, and I joined the Sunday Morning Linux Review teak to do a wrap-up recording on what we saw at Penguicon 2015. And then it was on to the closing ceremonies, which were liberally punctuated by the firing of a T-shirt cannon. Prizes were also awarded for best room party, volunteers were recognized, and so on.

So, that was my personal experience of Pengucion 2015. I am already registered for next year, and have made the commitment to stay on as the Tech Track head, so I will try to do as good a job for next year. I don’t think we could handle any more content than we had, but there are some things I am working on to bring in some specific talks that I think will attract people with a tech interest.

Oct 09

Penguicon 2015 Call for Talks

I am the coordinator for the Tech Track at Penguicon 2015, which is a combined FOSS/Science Fiction convention held every spring in the Metro-Detroit area. The 2015 event will happen April 24-26 at the Westin Hotel in Southfield, MI. The theme for the upcoming year’s event is Biotechnology and Medicine, looking at how technology is affecting our health and life. But we want a lot of different talks as well, so I will be happy to accept proposals that look a things like cloud computing, security, hardware hacks, and anything else that would be of interest to geeks and hackers.

This year we are experimenting with a new process to gather talk proposals that will also allow some “social” features, such as letting prospective attendees vote on talks they would attend. This helps the programmers to know what talks might be of the greatest interest and thus program the most desirable talks. I know I look at this information in setting my track, and I believe the other track coordinators do something similar. This is something called TuxTrax, and as it is developed it should become even more useful. But if you encounter any problems at all, please do not hesitate to contact me directly. My official Penguicon e-mail is tech at penguicon dot org.

May 12

Penguicon 2014

I have recovered from the weekend and this seems like a good time to recap my own experience of Penguicon 2014, my first as part of the team.

I had been associated with Ohio Linux Fest for a few years, but Penguicon actually came first. In fact, I was recruited for OLF by Beth Lynn Eicher while at Penguicon where I had engaged with Jorge Castro of Canonical on the importance of working with Linux Users Groups. And while I enjoyed my work with OLF, I felt like I had done what I wanted to do there and that it was time to move on. So I contacted my friend James Hice and offered to “help” with the Tech Track. But sometime in February it became clear that I was “The Tech Track Guy”. Fortunately, I was not upset by this development since I had things in mind to do, and I have already indicated my willingness to continue in that role for another year. And I am excited that pretty much the whole Penguicon team is continuing, which makes the upcoming year exciting. I will no doubt have more to say on that as time goes by, but this is about 2014.

My focus is really on the Tech Track because I was responsible for that (I only attended one talk outside of it, in fact.) So my own recollection does not cover everything that went on over the weekend. And because we had approximately 70 hours of Tech Track programming I could not attend every talk in my own track. So when I got the complaint from others that they wanted to attend two talks that were scheduled opposite each other, I could sympathize, but frankly that is the kind of problem I will take any day of the week. The other “good” problem to have was that in many cases the rooms were packed. Penguicon had record attendance this year, which of course means we have to figure out how to do better next year. The host facility was excellent, the Westin Hotel in Southfield, Michigan, and everyone I talked to had good things to say about them. I hope we can go back next year.

I think this year was the best Tech Track in my memory at least, and the credit for that goes to our Con Chair, Nuri Gocay. Every time I asked how many talks he wanted, the answer was that there could not be too many Tech talks. We had two days-worth of slots (from Friday afternoon to Sunday afternoon) and by the end it was tricky just finding the rooms and equipment to make it happen, but we mostly managed to do it, and the result was about 70 hours of pretty good Tech programming. Our focus for the Tech track was on security and privacy issues raised by Edward Snowden, and we had some good programming to address different aspects of this. Michael Lucas did several talks that fit in here, including one on the proper way to set up sudo, and another on DNSSEC, and one on ssh key authentication. Michael is an author, and has written books on all of these which he sells from his site Tilted Windmill Press. He offered a special deal to Penguicon attendees that weekend to buy a bundle of all three e-books for $20, and I was happy to take advantage. I will happily have Michel back again as he really knows his stuff.

Then my friend Mark Stanislav from Duo Security gave us a talk on two-factor authentication. Mark is a great speaker and I always learn something from his talks. And Susan Sons ran the Cryptoparty, Penguicon Edition, with help from Eric Raymond, Chris Nehren, and John D. Bell. This provided a beginners guide to using encryption, and I was glad to have it. I also did a presentation with my friend Tony Bemus from the Sunday Morning Linux Review on the subject of Encryption which was well-received. People who have followed the series on this site already know the kinds of things we covered, such as using plugins for Thunderbird and G-Mail, but there are always people who haven’t seen this before and need the information.

Chris Krieger, another old friend, did a presentation on Securing Your Home Network with a Hardware Firewall. Chris focused on using PFSense, and I think I will be investigating this for my own series at some point. And for next year we suggested a presentation on snort. Mark Kikta did a presentation on Linux Dorking: Exploring the Basics of Linux From the Eyes of an Attacker, that was very good. Mark is yet another professional in the security industry, like Chris Krieger, Mark Stanislav, Michael Lucas, and Susan Sons, and that is what makes the presentations so good, in my view. You know you are getting the information from people who live this every day and know what they are talking about.

Aside from the purely technical aspects of security and privacy, we also addressed the policy aspects. Among our Guests of Honor this year was Eva Galperin of the Electronic Frontier Foundation, and Cory Doctorow, noted author, was a Featured Guest. They addressed several aspects, but one notable panel was The NSA is Watching You: The Government, Surveillance, and You. We also had a Guest of Honor who covered multiple areas, YT Cracker. He was primarily a musical guest, but also has a background in hacking that let him contribute to panels in that area. And while I am on Guests of Honor, let me mention Ernie Cline, author of Ready Player One, Ed Mason from Gameface Labs, who showed off a virtual reality headset throughout the weekend in addition to joining panels, and Erika Carlson, a software developer who created Girl Develop It, a Detroit area group that focuses on helping girls become coders. It was an excellent group of GoHs.

There were some other things in the Tech Track worth mentioning, and in fact the Security and Privacy was probably less than 50% of the total here. Jorge Castro from Canonical gave a presentation on Ubuntu, and another on building a Steambox. We had an Ubuntu release party on Saturday night, put on by the Ubuntu MI Loco. Sunday Morning Linux Review did a live recording of their podcast, something that has become an annual event at Penguicon. And Ruth Suehle and Tom Callaway of Red Hat gave two presentations, one on Raspberry Pi Hacks which is the title of their recent book published by O’Reilly, and another on 3D printing using Linux.

In addition, there talks in HTML5, Python, Pascal, the Watson computer from IBM, programming for total noobs, and I could keep going. Just reporting on everything that went on would make this article way too long, and I have tried to focus on the things I actually attended. But the thing about Penguicon is that it also a Science Fiction convention, so that means you have literature panels, music session, costuming and cosplay, and everything involved there. And I believe Ruth Suehle from Red Hat won a contest for best costume. My one talk outside the Tech Track that I attended was a panel on the Future of Health Care, which is a personal interest for me, having worked for two hospitals in my career.

So I would encourage everyone to plan on attending Penguicon 2015. As soon as the date is confirmed I will surely be posting about it, but with the team we have I think it will be even bigger and better than 2014, and there is plenty there for everyone.

Apr 29

Turning Off Comments

I wish this was not necessary, but the fact is that virtually all of the comments that people attempt to post on my site are spam. I was manually moderating, but it seems like a waste of time at this point. If you are  a real person you can probably figure out how to reach me.

Dec 13

Statistics and Polling

This is a bit of a change of pace, but I got some inquiries about this and thought I would offer my own two cents on something that often confuses people. My qualifications for this are two-fold:

  1. In my past life I was a professor who taught classes in Statistics;
  2. I have worked for a political consulting company that among other things performed polling for clients.

So you can use this in deciding if you want to pay any attention to what I have to say on the subject. 🙂

To get started, the basic question of epistemology: How do we know what we say we know. In the case of statistics, the basic mathematics began to be developed as a way of analyzing gambling. When you play poker, and a hand with three of a kind beats a hand with two pair, that is because two pair (shows up 4.75% of the time) is more likely than three of a kind (shows up 2.11% of the time). But after its start in gambling, statistics took a big step during the Napoleonic wars, when for the first time large armies met and the casualties mounted up. Some doctors realized that gathering evidence about wounds and their treatment would lead them to select the best treatments. But they key factor is that this is all based on probability. And the best way to think about probability is to think about what would happen if you did the same thing and over and over. You might well get a range of outcomes, but some outcomes would show up more often. And this is the first thing that throws a lot of people, because they often have this sense that if something is unlikely, it won’t happen at all. And that is simply untrue. Unlikely things will happen, just not as often. As a joke has it, if you are one in a million, there are 1,500 people in China exactly like you. But the heritage of gambling persists in the technique called Monte Carlo simulations, which run an experiment many, many times, often via a computer algorithm, to generate random data to test theories. John von Neumann understood the significance of this approach, and programmed one the first computers, ENIAC, to carry out Monte Carlo simulations

The next key concept is called the Law of Large Numbers, which in layman’s terms says that if you repeat the experiment many times, the average result should be equal to the expected result. Now this is the average we are talking about here. Any particular experiment could give weird results that are nothing like the expected result, and that is to be expected in a distribution of results. But when you average out between each experiment, the occasional high ones are offset by the occasional low ones, and the average result is pretty good. But to get this you need to do it many, many times. The more times you repeat the experiment the closer your results should be.

Our third key concept is Random Sampling. This says that every member of a population has an equal chance of being selected for a sample. And the population is whatever group you want to make a claim about. If you want to make a claim about left-handed Mormons, your sample should exclude anyone right-handed people or any Lutherans, but it should afford an equal chance of selection for all left-handed Mormons. This is where a lot of problems can arise. For instance, many medical studies in the 20th century included all or mostly men, but the results were applied to all adults. This is now recognized as a big problem in medicine. When this happens we call the problem Sampling bias.

So, with these basic concepts (and see, I did not use any math yet!) we can start to look at polling, and just how good it is or isn’t as the case may be. And it is often very good, but history does show some big blunders along the way.

The first thing to get out of the way is that sampling, done properly, works. This is a mathematical fact and has been proven many times over. You may have trouble believing that 1000 people are an accurate measure of what a million people, or even 100 million people will do, but in fact it does work. When there are problems it is usually because someone made a mistake, such as drawing a sample that is not truly an unbiased sample from the population in question. This does happen and you need to be careful about this in examining polling results. In the earlier part of the twentieth century there were some polls done via telephone surveys, but because telephones were not universally available at that time these polls overstated the views of more affluent people who were more likely to have phones. By the latter part of the century, however, telephone surveys were perfectly valid because almost everyone had a phone (and the few who didn’t were not likely to be voters anyway). But now we have a different problem, in that many people (myself included) have gone to using mobile phones exclusively, and the sampling methods in many cases relied solely on landline telephones. Polling outfits are beginning to adjust for this, so it should not be a problem. But you need to watch out for ways pollsters will limit the sample. A big issue is whether you should include all registered voters (in the U.S., you need to be registered before you can vote. I am not familiar with how other countries handle this.), or if you want to limit it to “likely voters”. Deciding who is a “likely voter” is place where some serious bias can creep in, since it is purely a judgement call by the pollster.

So how do we know that samples work? We have two strong pieces of evidence. First, we know from Monte Carlo simulations how well samples compare to the underlying populations in controlled experiments. You create a population with known parameters, pull a bunch of samples, and see how well they match up to the known population. Second, we have the results of many surveys which we can compare to what actually happens when an election (for instance) is held. Both of these give us confidence that we understand the fundamental mathematics involved.

The next concept to understand is Confidence Interval. This comes from the fact that even an unbiased sample will not match the population exactly. To see what I mean, consider what happens if you toss a fair (unbiased) coin. If it is a truly fair coin, you should get heads 50% of the time, on average, and tails 50% of the time. But the key here is “on average”. If you tossed this coin 100 times, would you always get exactly 50 heads and 50 tails? Of course not. You might get 48 heads and 52 tails the first time, 53 heads and 47 tails the second time, etc. If you did this a whole bunch of times and averaged your results, you would get ever closer to that 50/50 split, but probably not hit it exactly. And what this means is that your results will be close to what is in the population most of the time, but terms like “close” and “most of the time” are very imprecise. How close, and how often really should be specified more precisely. And we can do that with the Confidence Interval. This starts with the “how often” question, and the standard usually used is 95% of the time. This is called a 95% confidence interval, but sometimes the complement is used and it gets referred to as “accurate to the .05 level. These are essentially the same thing for our purposes. And if you are a real statistician, please remember that this podcast is not intended to be a graduate-level statistics course, but rather a guide for the intelligent lay person who wants to understand the subject.  The 95% level of confidence is kind of arbitrary, and in some scientific applications this can be raised or lowered, but in polling you can think of this as the “best practice” industry standard.

The other part, the “how close” question, is not at all arbitrary. It is called formally the Margin of Error, and once you have chosen the level of confidence, it is a pretty straightforward function of the sample size  In other words, if you toss a coin ten times, getting six heads and four tails is very likely. But if you toss it 100 times, getting 60 heads and 40 tails is much less likely. So the bigger the sample size, the closer it should match the population. You might think that pollsters would therefore use very large sample sizes to get better accuracy, but you run into a problem. Sampling has a linear cost. If you double the sample size, you double the cost of the survey. If that resulted in double the accuracy it might be worth it, but in fact for reasonable sample sizes it won’t. Doubling the sample size might get you 10% more accuracy in your results, and is that worth spending twice the money? Not really. So you are looking for a sweet spot where the cost of the survey is not too much, but the accuracy is acceptable.

Any reputable poll should make available some basic information about the survey. The facts that should be reported include:

  • When the poll was taken. Timing can mean a lot. If one candidate was caught having sex with a live man or a dead woman, as the joke has it, it matters a lot whether the poll was taken before or after that fact came out in the news.
  • How big a sample was it?
  • What kinds of people were sampled? Was there an attempt to limit it to likely voters?
  • What is the margin of error?
  • What is the confidence interval?

Now a reputable pollster will make these available, but that does not mean they will be reported in a newspaper or television story about the poll. Or they may be buried in a footnote. But these factors all affect how you should interpret the poll.

Example: http://www.politico.com/story/2013/12/polls-obamacare-100967.html

In this brief news report we don’t get everythng, but we got a lot of it. This story is about two polls just done (as I write this) on people’s opinions regarding “Obamacare”.

The Pew survey of 2,001 adults was conducted Dec. 3 to Dec. 8 and has a margin of error of plus-or-minus 2.6 percentage points.

The Quinnipiac survey of 2,692 voters was conducted from Dec. 3 to Dec. 9 and has a margin of error of plus-or-minus 1.9 percentage points.

What I would note is the the first poll says it was a poll of “adults”, while the second poll was one of “voters”. That makes me wonder about any differences in the results (and the polls did indeed have different results). They were sampling different populations, so the results are not comparable. If the purpose of the survey is to look at how people in general feel, a survey of adults would probably make sense. If the purpose was to forecast how this will affect candidates in the 2014 elections, the second poll may be more relevant.

Second, note that the survey with the larger sample size had a slightly smaller margin of error. That is what we should expect to see.

Third, note that the second poll was “in the field” as we say for one more day than the first poll. Does that matter? It might if some very significant news event happened on the 9th of December that might affect the results.

What I don’t see in this report is any explanation of how the people were contacted, but if I went to their web site, here is what I found on the Quinnipiac site:

From December 3 – 9, Quinnipiac University surveyed 2,692 registered voters nationwide with a margin of error of +/- 1.9 percentage points. Live interviewers call land lines and cell phones.

So if you dig you can get all of this. And note that they specifically mentioned calling cellphones as part of their sample.

One final thing to point out is that if you accept a 95% confidence level, that means that by definition approximately one out of every 20 polls will be, the use the technical term, “Batcrap crazy”. That is why you should never assign too much significance to any one poll, particularly if it gives you results different from all other polls. You are probably looking at that one out of twenty polls that should be ignored. There is a human tendency to seize on it if it tells you what you want to hear, but that is usually a mistake. It is when a number of pollsters do a number of polls and get roughly the same result that you should start to believe it. That does not mean they will agree exactly, there is still the usual margin of error. That is why a poll that show one candidate getting 51% of the vote and her opponent getting 49% will be described as a “dead heat”. With the margin of error, the candidate could be getting anywhere between 53% and 49% assuming the poll is accurate and unbiased.

Nov 04

Review of Beyond Fear: Thinking Sensibly About Security In An Uncertain World

Beyond Fear: Thinking Sensibly about Security in an Uncertain WorldBeyond Fear: Thinking Sensibly about Security in an Uncertain World by Bruce Schneier
My rating: 5 of 5 stars

Bruce wrote this book in 2003 as a response to 9/11 and how it lead to changes in security practices in the U.S. He criticizes many of the security measures taken as “security theater” that makes it look like something is being done without actually accomplishing anything useful. His criticisms probably are nothing terribly new to people 2013 when many people have come to similar conclusions, but what I think is more important in this book is that he attempts to lay out a way of thinking about security that is rational. Security can never be 100% in a world of human beings, and security always entails trade-offs that make it a cost-benefit decision. As an example, you would never hire an armed guard to protect your empty bottles for getting the 10 cent deposit back. That just doesn’t make sense. Bruce lays out a 5 point analysis you can do with any security plan that asks questions about what you are trying to protect, what are the costs of the protection, will the proposed solution actually work, etc. It is a good analysis and worth a read if you want to learn how to think intelligently about security.

View all my reviews

Nov 04

Review of The Code Book

The Code Book: The Science of Secrecy from Ancient Egypt to Quantum CryptographyThe Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography by Simon Singh
My rating: 5 of 5 stars

This book is a very good review of the history on encryption and explains the basic principles involved. It is a lot like David Kahn’s The Code Breakers, but is available for a good deal less. Beginning with Herodotus and some secrecy measures from The Persian Wars, it then moves forward with Arab scholars, medieval developments, and right up to asymmetric public key encryption used today. Highly recommended for anyone who wants to get an overview of what the issues are, but is not looking to dive into the mathematics.

View all my reviews

Aug 21

Review of The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal

The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing PersonalThe Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal by M. Mitchell Waldrop
My rating: 5 of 5 stars

Having just read Katie Hafner’s Where Wizard’s Stay Up Late I was ready to tackle this book, which is both deeper and more ambitious. Where Hafner’s book was purely about the origin of the Internet, Waldrop is taking on the whole idea of personal computing. Licklider thus provides the focus for this book, for while he played a crucial role in promoting networking, his true aim was always what he termed a symbiotic partnership between humans and computers, and for him networking was just a necessary step to getting there. That is one of the reasons Licklider provided crucial support to Doug Engelbart, for instance. And even when Licklider was out of the picture (during the heyday of Xerox PARC, for instance) Waldrop keeps his focus on the development of the personal computer. If you like this kind of history and want to know just who did what in those early days, this book is indispensable.

View all my reviews

Older posts «