Tuesday, November 4, 2014

经验教训

把自己的项目复习明白!

Linkedin

merging linkedlist
paint house with lowest cost
shorten URL service

Monday, November 3, 2014

PI面筋

PI面筋
What is the purpose of a database index? What is the tradeoff? 

Take n-length array with n-length words in it to find longest word that can chain to a 3 letter word (in the array) by taking out any one letter and becoming an anagram for a shorter word in the array.

Enumerate all NxN grids that can be composed of valid English words. 
Architect a chess program. 
An amusing question about finding "happy" numbers. 

write a function that returns the minimum edit distance between two strings and analyze its complexity.

How many characters in the front of a string would need to be added in order to make it a palindrome.

1. Discussion on past projects. One algorithm question related to a data structures.
2. Object oriented code design question
3. Algorithm questions on arrays and tress.
4. System design question.

explain hash tables (insertion/deletion/search time, how rehashing affects these times

Split a sentence into words (even though Pinterest is written in Python, they don't want you to just use Python's split function, but use a lower level language like C to take less memory)

countandsay
duplicate removal algorithm
k mostly frequently used words in a string
first missing number
implement a hashmap and all its methods

read through string and output number of times each digit occurs, reverse words in sentence, etc.

sentence partitioning
reading matrix diagonally

how would you desing pinterest’s architecture
given an array of words find what is and how long is the length of the longest common substring between two words in the array (prefix tree)
given an arrya of numbers see if you can get to index with 0 in it from an index by jumping through the array using the values in the array. So if you have [1,2,1,0,3] you can get to 0, from 0 by jumping 0, you can get to 0 from 3, by jumping 3 index down to 2 and then jumping 2 index up to 0..

Write a job scheduler. Input : list of jobs with dependencies. Output: list of jobs to be executed according to dependencies.
Count calendar days. Input: list of ranges [start,end], a range represent days i.e. [1,4] means days from 1 to day 4. Output: Total days included in the list of ranges. Answer was 1)Sort list, 2)Merge overlapping ranges and 3) Count/Sum the days of the resulting non-overlapping ranges. 

http://www.mitbbs.com/article_t/JobHunting/32570751.html
如果要给每个Pin加上一个price tag,怎么去evaluate这是否work?
(1) A/B testing -> 可以有好几种,讨论优劣性
(2) metrics to monitor -> click rate, impression, return user ratio, etc

上门:
1. 假设Pinterest的更新系统只能显示3条更新,怎么设计?更新可以是:用户评论、
加新的pin,repin等等,一共可能有一千多种。讨论各种方法的优劣性
回答:a ranking problem...

2. 给如下的数据格式:
<start_time, end_time, value>
比如有一组数据:
1, 3, 100
2, 4, 200
5, 6, 300
。。。
这些数据时间点可能有重合。在时间段2~3之间,value的和是100+200 = 300. 找出这
组数据中最高的value和。
回答1: 用一个数组,每个cell代表一个timestamp,然后扫一遍数据,increment相应
的cell。-》 面试官:时间连续怎么办?有没有更好的办法。
答案:把数据变成:<type, time, value>;然后按照时间排序。如果是start_time,就
+value,不然就-value:
int sum = 0;
int max = 0;

// sort by time

while(have more lines) {
    if(type is start) sum += value;
    else sum -= value;
    if(sum > max) max = sum;
}
return max;

3. 设计一个数据结构支持O(1)的insert, remove, find random(老题)

4. java arraylist里如果满了,怎么办?为什么?
答: make a new copy, size double. 原因是:double size的时候需要拷贝原来的n
个数据,当当前这个长度为2*n的arraylist再满的时候,至少还需要插入n个数据,这
样平均每个数据的cost是在O(1)级别的

5. 怎么做weighted random sampling?(老题)

6. 有产生5的随机数,怎么生成7的?(老题)

7. 怎么去找log中的异常?outlier detection